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Preface 


It is our great pleasure to present the proceedings of Asiacrypt 2013 in two 
volumes of Lecture Notes in Computer Science published by Springer. This was 
the 19th edition of the International Conference on Theory and Application of 
Cryptology and Information Security held annually in Asia by the International 
Association for Cryptologic Research (IACR). The conference was organized by 
IACR in cooperation with the Cryptology Research Society of India and was 
held in the city of Bengaluru in India during December 1-5, 2013. 

About one year prior to the conference, an international Program Committee 
(PC) of 46 scientists assumed the responsibility of determining the scientific 
content of the conference. The conference evoked an enthusiastic response from 
researchers and scientists. A total of 269 papers were submitted for possible 
presentations approximately six months before the conference. Authors of the 
submitted papers are spread all over the world. PC members were allowed to 
submit papers, but each PC member could submit at most two co-authored 
papers or at most one single-authored paper. The PC co-chairs did not submit 
any paper. All the submissions were screened by the PC and 54 papers were 
finally selected for presentations at the conference. These proceedings contain 
the revised versions of the papers that were selected. The revisions were not 
checked and the responsibility of the papers rests with the authors and not the 
PC members. 

Selection of papers for presentation was made through a double-blind re- 
view process. Each paper was assigned three reviewers and submissions by PC 
members were assigned six reviewers. Apart from the PC members, 291 external 
reviewers were involved. The total number of reviews for all the papers was more 
than 900. In addition to the reviews, the selection process involved an extensive 
discussion phase. This phase allowed PC members to express opinion on all the 
submissions. The final selection of 54 papers was the result of this extensive and 
rigorous selection procedure. One of the final papers resulted from the merging 
of two submissions. 

The best paper award was conferred upon the paper “Shorter Quasi- Adaptive 
NIZK Proofs for Linear Subspaces” authored by Charanjit Jutla and Arnab Roy. 
The decision was based on a vote among the PC members. In addition to the 
best paper, the authors of two other papers, namely, “Families of Fast Elliptic 
Curves from Q- Curves” authored by Benjamin Smith and “Key Recovery Attacks 
on 3-Round Even-Mansour, 8-Step LED-128, and Full AES 2 ” authored by Itai 
Dinur, Orr Dunkelman, Nathan Keller and Adi Shamir, were recommended by 
the Editor-in-Chief of the Journal of Cryptology to submit expanded versions to 
the journal. 

A highlight of the conference was the invited talks. An extensive multi-round 
discussion was carried out by the PC to decide on the invited speakers. This 
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resulted in very interesting talks on two different aspects of the subject. Lars 
Ramkilde Knudsen spoke on “Block Ciphers — Past and Present” a topic of 
classical and continuing importance, while George Danezis spoke on “Engineering 
Privacy-Friendly Computations,” which is an important and a more modern 
theme. 

Apart from the regular presentations and the invited talks, a rump session 
was organized on one of the evenings. This consisted of very short presentations 
on upcoming research results, announcements of future events, and other topics 
of interest to the audience. 

We would like to thank the authors of all papers for submitting their research 
works to the conference. Such interest over the years has ensured that the Asi- 
acrypt conference series remains a cherished venue of publication by scientists. 
Thanks are due to the PC members for their enthusiastic and continued partic- 
ipation for over a year in different aspects of selecting the technical program. 
External reviewers contributed by providing timely reviews and thanks are due 
to them. A list of external reviewers is provided in these proceedings. We have 
tried to ensure that the list is complete. Any omission is inadvertent and if there 
is an omission, we apologize to the person concerned. 

Special thanks are due to Satyanarayana V. Lokam, the general chair of 
the conference. His message to the PC was to select the best possible scientific 
program without any other considerations. Further, he ensured that the PC co- 
chairs were insulated from the organizational work. This work was done by the 
Organizing Committee and they deserve thanks from all the participants for 
the wonderful experience. We thank Daniel J. Bernstein and Tanja Lange for 
expertly organizing and conducting the rump session. 

The reviews and discussions were entirely carried out online using a software 
developed by Shai Halevi. At several times, we had to ask Shai for his help with 
some feature or the other of the software. Every time, we received immediate 
and helpful responses. We thank him for his support and also for developing the 
software. We also thank Josh Benaloh, who was our IACR liaison, for guidance 
on several issues. Springer published the volumes and made these available before 
the conference. We thank Alfred Hofmann and Anna Kramer and their team for 
their professional and efficient handling of the production process. 

Last, but, not the least, we thank Microsoft Research; Google; Indian Statis- 
tical Institute, Kolkata; and National Mathematics Initiative, Indian Institute of 
Science, Bengaluru; for being generous sponsors of the conference. 
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Abstract. In the 1980s researchers were trying to understand the de- 
sign of the DES, and breaking it seemed impossible. Other block ciphers 
were proposed, and cryptanalysis of block ciphers got interesting. The 
area took off in the 1990s where it exploded with the appearance of dif- 
ferential and linear cryptanalysis and the many variants thereof which 
appeared in the time after. In the 2000s AES became a standard and 
it was constructed specifically to resist the general attacks and the area 
of (traditional) block cipher cryptanalysis seemed saturated.... Much of 
the progress in cryptanalysis of the AES since then has come from side- 
channel attacks and related-key attacks. 

Still today, for most block cipher applications the AES is a good 
and popular choice. However, the AES is perhaps not particularly well 
suited for extremely constrained environments such as RFID tags. There- 
fore, one trend in block cipher design has been to come up with ultra- 
lightweight block ciphers with good security and hardware efficiency. I 
was involved in the design of the ciphers Present (from CHES 2007), 
PrintCipher (presented at CHES 2010) and PRINCE (from Asiacrypt 
2012). Another trend in block cipher design has been try to increase the 
efficiency by making certain components part of the secret key, e.g., to 
be able to reduce the number of rounds of a cipher. 

In this talk, I will review these results. 


Engineering Privacy-Friendly Computations 


George Danezis 1,2 

1 University College London 
2 Microsoft Research, Cambridge 


Abstract. In the past few years tremendous cryptographic progress has 
been made in relation to primitives for privacy friendly-computations. 
These include celebrated results around fully homomorphic encryption, 
faster somehow homomorphic encryption, and ways to leverage them to 
support more efficient secret-sharing based secure multi-party compu- 
tations. Similar break-through in verifiable computation, and succinct 
arguments of knowledge, make it practical to verify complex computa- 
tions, as part of privacy-preserving client side program execution. Besides 
computations themselves, notions like differential privacy attempt to cap- 
ture the essence of what it means for computations to leak little personal 
information, and have been mapped to existing data query languages. 

So, is the problem of computation on private data solved, or just about 
to be solved? In this talk, I argue that the models of generic computation 
supported by cryptographic primitives are complete, but rather removed 
from what a typical engineer or data analyst expects. Furthermore, the 
use of these cryptographic technologies impose constrains that require 
fundamental changes in the engineering of computing systems. While 
those challenges are not obviously cryptographic in nature, they are nev- 
ertheless hard to overcome, have serious performance implications, and 
errors open avenues for attack. 

Throughout the talk I use examples from our own work relating to 
privacy-friendly computations within smart grid and smart metering de- 
ployments for private billing, privacy-friendly aggregation, statistics and 
fraud detection. These experiences have guided the design of ZQL, a 
cryptographic language and compiler for zero-knowledge proofs, as well 
as more recent tools that compile using secret-sharing based primitives. 
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Abstract. We define a novel notion of quasi-adaptive non- interactive 
zero knowledge (NIZK) proofs for probability distributions on parametri- 
zed languages. It is quasi-adaptive in the sense that the common reference 
string (CRS) generator can generate the CRS depending on the language 
parameters. However, the simulation is required to be uniform, i.e., a sin- 
gle efficient simulator should work for the whole class of parametrized 
languages. For distributions on languages that are linear subspaces of 
vector spaces over bilinear groups, we give quasi-adaptive computation- 
ally sound NIZKs that are shorter and more efficient than Groth-Sahai 
NIZKs. For many cryptographic applications quasi-adaptive NIZKs suf- 
fice, and our constructions can lead to significant improvements in the 
standard model. Our construction can be based on any fc-linear assump- 
tion, and in particular under the external Diffie Heilman (XDH) as- 
sumption our proofs are even competitive with Random-Oracle based 
.E-protocol NIZK proofs. 

We also show that our system can be extended to include integer 
tags in the defining equations, where the tags are provided adaptively by 
the adversary. This leads to applicability of our system to many applica- 
tions that use tags, e.g. applications using Cramer-Shoup projective hash 
proofs. Our techniques also lead to the shortest known (ciphertext) fully 
secure identity based encryption (IBE) scheme under standard static 
assumptions (SXDH). Further, we also get a short publicly- verifiable 
CCA2-secure IBE scheme. 

Keywords: NIZK, Groth-Sahai, bilinear pairings, signatures, 
dual-system IBE, DLIN, SXDH. 


1 Introduction 

In [13] a remarkably efficient non-interactive zero-knowledge (NIZK) proof sys- 
tem [3] was given for groups with a bilinear map, which has found many appli- 
cations in design of cryptographic protocols in the standard model. All earlier 
NIZK proof systems (except [T2] , which was not very efficient) were constructed 
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by reduction to Circuit Satisfiability. Underlying this system, now commonly 
known as Groth-Sahai NIZKs, is a homomorphic commitment scheme. Each 
variable in the system of algebraic equations to be proven is committed to using 
this scheme. Since the commitment scheme is homomorphic, group operations 
in the equations are translated to corresponding operations on the commitments 
and new terms are constructed involving the constants in the equations and the 
randomness used in the commitments. It was shown that these new terms along 
with the commitments to variables constitute a zero-knowledge proof [131- 

While the Groth-Sahai system is quite efficient, it still falls short in comparison 
to Schnorr-based .^-protocols [5] turned into NIZK proofs in the Random Oracle 
model [2] using the Fiat-Shamir paradigm [10]. Thus, the quest remains to obtain 
even more efficient NIZK Proofs. In particular, in a linear system of rank t, 
some t of the equations already serve as commitments to t variables. Thus, the 
question arises if, at the very least, fresh commitments to these variables as done 
in Groth-Sahai NIZKs can be avoided. 

Our Contributions. In this paper, we show that for languages that are linear 
subspaces of vector spaces of the bilinear groups, one can indeed obtain more ef- 
ficient computationally-sound NIZK proofs in a slightly different quasi- adaptive 
setting, which suffices for many cryptographic applications. In the quasi-adaptive 
setting, we consider a class of parametrized languages {L p }, parametrized by p. 
and we allow the CRS generator to generate the CRS based on the language 
parameter p. However, the CRS simulator in the zero-knowledge setting is re- 
quired to be a single efficient algorithm that works for the whole parametrized 
class or probability distributions of languages, by taking the parameter as input. 
We will refer to this property as uniform simulation. 

Many hard languages that are commonly used in cryptography are distri- 
butions on class of parametrized languages, e.g. the DDH language based on 
the decisional Diffie-Hellman (DDH) assumption is hard only when in the tuple 
(g, f , x ■ g, x ■ f ), even f is chosen at random (in addition to x ■ g being chosen 
randomly). However, applications (or trusted parties) usually set f, once and 
for all, by choosing it at random, and then all parties in the application can 
use multiple instances of the above language with the same fixed f . Thus, we 
can consider f as a parameter for a ciass of languages that only specify the last 
two components above. If NIZK proofs are required in the application for this 
parametrized language, then the NIZK CRS can be generated by the trusted 
party that chooses the language parameter f . Hence, it can base the CRS on the 
language pararneteiQ- 

We remark that adaptive NIZK proofs [3] also allow the CRS to depend 
on the language, but without requiring uniform simulation. Such NIZK proofs 
that allow different efficient simulators for each particular language (from a 
parametrized class) are unlikely to be useful in applications. Thus, most NIZK 
proofs, including Groth-Sahai NIZKs, actually show that the same efficient 

1 However, in the security definition, the efficient CRS simulator does not itself gen- 
erate f , but is given f as input chosen randomly. 
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simulator works for the whole class, i.e. they show uniform simulation. The 
Groth-Sahai system achieves uniform simulation without making any distinc- 
tion between different classes of parametrized languages, i.e. it shows a single 
efficient CRS simulator that works for all algebraic languages without taking 
any language parameters as input. Thus, there is potential to gain efficiency by 
considering quasi-adaptive NIZK proofs, i.e. by allowing the (uniform) simulator 
to take language parameters as inpuf 0 . 

Our approach to building more efficient NIZK proofs for linear subspaces is 
quite different from the Groth-Sahai techniques. In fact, our system does not 
require any commitments to the witnesses at all. If there are t free variables in 
defining a subspace of the n-dimensional vector-space and assuming the subspace 
is full-ranked (i.e. has rank t), then t components of the vector already serve as 
commitment to the variables. As an example, consider the language L (over a 
cyclic group G of order q, in additive notation) to be 

L = {(Ii, 1 %, I3) € G 2 3 4 | 3 xi, X2 € : Zi = x\ ■ g, I2 = £2 • f , h = (^1 + £2) • h} 

where g, f , h are parameters defining the language. Then, Zi and I2 are already 
binding commitments to £1 and rt'2 . Thus, we only need to show that the last 
component 1 3 is consistent. 

The main idea underlying our construction can be summarized as follows. 
Suppose the CRS can be set to be a basis for the null-space of the language 
L p . Then, just pairing a potential language candidate with L ^ and testing for 
all-zero suffices to prove that the candidate is in L p , as the null-space of 
is just L p . However, efficiently computing null-spaces in hard bilinear groups is 
itself hard. Thus, an efficient CRS simulator cannot generate L~ (- , but can give a 
(hiding) commitment that is computationally indistinguishable from a binding 
commitment to . To achieve this we use a homomorphic commitment just 
as in the Groth-Sahai system, but we can use the simpler El-Gamal encryption 
style commitment as opposed to the more involved Groth-Sahai commitments, 
and this allows for a more efficient verifier^. As we will see later in Section [o] 
a more efficient verifier is critical for obtaining short identity based encryption 
schemes (IBE). 

In fact, the idea of using the null-space of the language is reminiscent of 
Waters’ dual-system IBE construction EH, and indeed our system is inspired 
by that constructior@, although the idea of using it for NIZK proofs, and in 
particular the proof of soundness is novel. Another contribution of the paper is 
in the definition of quasi-adaptive NIZK proofs. 

2 It is important to specify the information about the parameter which is supplied as 
input to the CRS simulator. We defer this important issue to Section [2] where we 
formally define quasi-adaptive NIZK proofs. 

3 Our quasi-adaptive NIZK proofs are already shorter than Groth-Sahai as they require 
no commitments to variables, and have to prove lesser number of equations, as 
mentioned earlier. 

4 In Section [5] and in the Appendix, we show that the design of our system leads to a 
shorter SXDH assumption based dual-system IBE. 
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For n equations in t variables, our quasi-adaptive computationally-sound 
NIZK proofs for linear subspaces require only k(n — t) group elements, under 
the fc-linear decisional assumption |23I5| . Thus, under the XDH assumption for 
bilinear groups, our proofs require only ( n — t ) group elements. In contrast, the 
Groth-Sahai system requires (n + 2 1) group elements. Similarly, under the deci- 
sional linear assumption (DLIN), our proofs require only 2 (n—t) group elements, 
whereas the Groth-Sahai system requires (2n + 3i) group elements. These pa- 
rameters are summarized in Table [T| While our CRS size grows proportional to 
t(n — t), more importantly there is a significant comparative improvement in the 
number of pairings required for verification. Specifically, under XDH we require 
at most half the number of pairings, and under DLIN we require at most 2/3 the 
number of pairings. The 17-protocol NIZK proofs based on the Random Oracle 
model require n group elements, t elements of Z q and 1 hash value. Although 
our XDH based proofs require less number of group elements, the 17-protocol 
proofs do not require bilinear groups and have the advantage of being proofs of 
knowledge (PoK). We remark that the Groth-Sahai system is also not a PoK 
for witnesses that are Z q elements. A recent paper by Escala et al [5] has also 
optimized proofs of linear subspaces in a language dependent CRS setting. Their 
system also removes the need for commitment to witnesses but still implicitly 
uses Groth Sahai proofs. In comparison, our proofs are still much shorter. 


Table 1 . Comparison with Groth-Sahai NIZKs for Linear Subspaces. Parameter t is 
the number of unknowns or witnesses and n is the dimension of the vector space, or in 
other words, the number of equations. 



Thus, for the language L above, which is just a DLIN tuple used ubiqui- 
tously for encryption, our system only requires two group elements under the 
DLIN assumption, whereas the Groth-Sahai system requires twelve group el- 
ements (note, t = 2, n = 3 in L above). For the Diffie-Hellman analogue of 
this language (x ■ g, x ■ f) , our system produces a single element proof under the 
XDH assumption, which we demonstrate in Section [3] (whereas the Groth-Sahai 
system requires (n + 2t =) 4 elements for the proof with t = 1 and n = 2). 

Our NIZK proofs also satisfy some interesting new properties. Firstly, the 
proofs in our system are unique for each language member. This has interesting 
applications as we will see later in a CCA2-IBE construction. Secondly, the CRS 
in our system, though dependent on the language parameters, can be split into 
two parts. The first part is required only by the prover, and the second part 
is required only by the verifier, and the latter can be generated independent 
of the language. This is surprising since our verifier does not even take the 
language (parameters) as input. Only the randomization used in the verifier 
CRS generation is used in the prover CRS to link the two CRSes. This is in 
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sharp contrast to Groth-Sahai NIZKs, where the verifier needs the language as 
input. This split-CRS property has interesting applications as we will see later. 

Extension to Linear Systems with Tags. Our system does not yet extend nat- 
urally to quadratic or multi-linear equations, whereas the Groth-Sahai system 
doeCl. However, we can extend our system to include tags, and allow the defining 
equations to be polynomially dependent on tags. For example, our system can 
prove the following language: 

L , _ f (h, h, I3, tag) 6 G 3 x Z, I 3 ®i, X2 € 2 , : 1 

\ Zi = Xl ■ f , I2 = X2 ■ g, I3 = (xi + TAG • X 2 ) ■ h J ' 

Note that this is a non-trivial extension since the tag is adaptively provided by 
the adversary after the CRS has been set. 

The extension to tags is very important, as we now discuss. Many applications 
require that the NIZK proof also be simulation-sound. However, extending NIZK 
proofs for bilinear groups to be unbounded simulation-sound requires handling 
quadratic equations (see [5] for a generic construction). On the other hand, many 
applications just require one-time simulation soundness, and as has been shown 
in [T3], this can be achieved for linear subspaces by projective hash proofs [7|. 
Projective hash proofs can be defined by linear extensions, but require use of 
tags. Thus, our system can handle such equations. Many applications, such as 
signatures, can also achieve implicit unbounded simulation soundness using pro- 
jective hash proofs, and such applications can utilize our system (see Section [S]). 

Applications. While the cryptographic literature is replete with NIZK proofs, 
we will demonstrate the applicability of quasi-adaptive NIZKs, and in particular 
our efficient system for linear subspaces, to a few recent applications such as sig- 
nature schemes [5], UC commitments El. password-based key exchange [16114] , 
key-dependent encryption [5], For starters, based on HU. our system yields an 
adaptive UC-secure commitment scheme (in the erasure model) that has only 
four group elements as commitment, and another four as opening (under the 
DLIN assumption; and 3 + 2 under SXDH assumption), whereas the original 
scheme using Groth-Sahai NIZKs required 5 + 16 group elements. 

We also obtain one of the shortest signature schemes under a static standard 
assumption, i.e. SXDH, that only requires five group elements. We also show 
how this signature scheme can be extended to a short fully secure (and perfectly 
complete) dual-system IBE scheme, and indeed a scheme with ciphertexts that 
are only four group elements plus a tag (under the SXDH assumption). This is 
the shortest IBE scheme under the SXDH assumption, and is technically even 
shorter than a recent and independently obtained scheme of [5] which requires 
five group elements as ciphertext. Table [5] depicts numerical differences between 
the parameter sizes of the two schemes. The SXDH-IBE scheme of [5] uses the 
concept of dual pairing vector spaces (due to Okamoto and Takashima [I9l20j . 

5 However, since commitments in Groth-Sahai NIZKs are linear, there is scope for 

mixing the two systems to gain efficiency. 
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and synthesized from Waters’ dual system IBE). However, the dual vector space 
and its generalizations due to others m do not capture the idea of proof ver- 
ification. Thus, one of our main contributions can be viewed as showing that 
the dual system not only does zero-knowledge simulation but also extends to 
provide a computationally sound verifier for general linear systems. 


Table 2. Comparison with the SXDH-based IBE of Chen et al. [B]. The notation | • | 
denotes the bit length of an element of the given group. 


| Public Key | Secret Key | Cipher 


| #Pairings | Anonymity 1 


I CLLWW12 [6] II 8|Gi| + |G T | 


|| 5|6i! + IGtI I 5|G 2 | tW+BEE: 


Finally, using our QA-NIZKs we show a short publicly-verifiable CCA2-secure 
IBE scheme. Public verifiability is an informal but practically important notion 
which implies that one can publicly verify if the decryption will yield “invalid 
ciphertext” . Thus, this can allow a network gateway to act as a filter. Our scheme 
only requires two additional group elements over the basic IBE scheme. 

Organization of the Paper. We begin the rest of the paper with the definition 
of quasi-adaptive NIZKs in Section [21 In Section [3] we develop quasi-adaptive 
NIZKs for linear subspaces under the XDH assumption. In Section 0] we extend 
our system to include tags, define a notion called split-CRS QA-NIZKs and 
extend our system to construct split-CRS NIZKs for affine spaces. Finally, we 
demonstrate applications of our system in Section [5] We defer detailed proofs 
and descriptions to the full paper US- We also describe our system based on the 
fc-linear assumption in ng. 

Notations. We will be dealing with witness-relations R that are binary rela- 
tions on pairs (x,w), and where w is commonly referred to as the witness. Each 
witness-relation defines a language L = {yr| 3w : R(x,w)}. For every witness- 
relation Rp we will use L p to denote the language it defines. Thus, a NIZK proof 
for a witness-relation R p can also be seen as a NIZK proof for its language L p . 

Vectors will always be row-vectors and will always be denoted by an arrow 
over the letter, e.g. r for (row) vector of Z q elements, and d as (row) vector of 
group elements. 

2 Quasi- Adaptive NIZK Proofs 

Instead of considering NIZK proofs for a (witness-) relation R, we will consider 
Quasi- Adaptive NIZK proofs for a probability distribution D on & collection of 
(witness-) relations R = {_R P }. The quasi-adaptiveness allows for the common 
reference string (CRS) to be set based on R p after the latter has been chosen 
according to V. We will however require, as we will see later, that the simulator 
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generating the CRS (in the simulation world) is a single probabilistic polynomial 
time algorithm that works for the whole collection of relations 1Z. 

To be more precise, we will consider ensemble of distributions on witness- 
relations, each distribution in the ensemble itself parametrized by a security 
parameter. Thus, we will consider ensemble {T>\} of distributions on collection of 
relations 1Z\. where each V\ specifies a probability distribution on 1Z\ = {fifyp}- 
When A is clear from context, we will just refer to a particular relation as R p , 
and write 1Z\ = {iZ p }. 

Since in the quasi-adaptive setting the CRS could depend on the relation, we 
must specify what information about the relation is given to the CRS generator. 
Thus, we will consider an associated parameter language such that a member of 
this language is enough to characterize a particular relation, and this language 
member is provided to the CRS generator. For example, consider the class of 
parametrized relations 1Z = {R P }, where parameter p is a tuple g, f, h of three 
group elements. Suppose, R p (on (Zi, Z 2 , Z3), {x\,x 2 )) is defined as 


e > r/i 1 1 \ /„ „ \\ det ( xi,x 2 € Z g , Zi, Z 2 , Z 3 e G and 

*<g,f,h>«M2, is), <*!,»>) = (i 1 =II . E ,( 2 = Vf,i J = (ll+I . 


X2 ) • V 


For this class of relations, one could seek a quasi-adaptive NIZK where the CRS 
generator is just given p as input. Thus in this case, the associated parameter 
language £par will just be triples of group element^. Moreover, the distribution 
V can just be on the parameter language Tpar, he. V just specifies ape £par- 
Again, £par is technically an ensemble. 

We call (K 0 ,Ki,P,V) a QA-NIZK proof system for witness-relations 1Z\ = 
{R p } with parameters sampled from a distribution V over associated parameter 
language £par, if there exists a probabilistic polynomial time simulator (Si, S 2 ), 
such that for all non-uniform PPT adversaries Mi,M 2 ,M 3 we have: 


Quasi- Adaptive Completeness: 


Pr[A <- K 0 (l m ); p^Vy^^ KfyA, p); {x, w) <- Ai(A, Vh p); 
7 r <- P (ip,x,w) : V(-0, a;,7r) = 1 if R p (x,w)\ = 1 

Quasi-Adaptive Soundness: 


Pr[A ^ K 0 (l m ); p <- Vyr}, ^ KfyA.p); 

(x, 7r) <- A 2 (A, ip,p) : V(^i, x, 7r) = 1 and -■(Biu : R p (x, w))] « 0 
Quasi-Adaptive Zero-Knowledge: 

Pr[A «- K 0 (l m y,p^Vy<l)^ KfyA ,p) : 4 >W, '*' , ( W,P) = 1] « 

Pr[A ^ K 0 (l m )-p^Vy^,T) S^A.p) : = 1], 

6 It is worth remarking that alternatively the parameter language could also be discrete 
logarithms of these group elements (w.r.t. to some base), but a NIZK proof under 
this associated language may not be very useful. Thus, it is critical to define the 
proper associated parameter language. 
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where S(i p,r,x,w) = S2(ip,r,x) for (x,w) £ R p and both oracles (i.e. P and 
S) output failure if ( x,w ) 0 R p . 

Note that ip is the CRS in the above definitions. 

3 QA-NIZK for Linear Subspaces under the XDH 
Assumption 

Setup. Let Gi,G 2 and G t be cyclic groups of prime order q with a bilinear 
map e : Gi x G2 A G t chosen by a group generation algorithm. Let gi and 
g 2 be generators of the group Gi and G2 respectively. Let Oi, O2 and Ot be 
the identity elements in the three groups Gi,G 2 and G t respectively. We use 
additive notation for the group operations in all the groups. 

The bilinear pairing e naturally extends to Z g -vector spaces of Gi and G2 
of the same dimension n as follows: e(a, b ) = e(a b ? :). Thus, if a = 

x • g, and b = y • g 2 , where x and y are now vectors over Z g , then e(a, b ) = 
(x • y T ) • e(g 1 ,g 2 ). The operator “ T ” indicates taking the transpose. 

Linear Subspace Languages. To start off with an example, a set of equations 
h = x 1 • g, I2 = X2 ■ f, I3 = ( xi + X2) ■ h will be expressed in the form l = x • A 
as follows: 



where x is a vector of unknowns and A is a matrix specifying the group constants 

g, f, h. 

The scalars in this system of equations are from the field TL q . In general, we 
consider languages that are linear subspaces of vectors of Gi elements. These 
are just Z, -modules, and since 7L q is a field, they are vector spaces. In other 
words, the languages we are interested in can be characterized as languages 
parameterized by A as below: 

L^ = {x • A £ G” | x £ Z* }, where A is a f x n matrix of Gi elements. 

Here A is an element of the associated parameter language £par, which is all 
t x n matrices of Gi elements. The parameter language £par also has a corre- 
sponding witness relation 7£par) where the witness is a matrix of h q elements : 
ftpar(A,A)iffA = A.g 1 . 

Robust and Efficiently Witness-Samplable Distributions. Let the t x n dimen- 
sional matrix A be chosen according to a distribution D on £par- We will call 
the distribution D robust if with probability close to one the left-most t columns 
of A are full-ranked. We will call a distribution V on £par efficiently witness- 
samplable if there is a probabilistic polynomial time algorithm such that it out- 
puts a pair of matrices (A, A) that satisfy the relation 7£par (be., AlparfA, A) 
holds), and further the resulting distribution of the output A is same as V. For 
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example, the uniform distribution on £par is efficiently witness-samplable, by 
first picking A at random, and then computing A. As an example of a robust dis- 
tribution, consider a distribution V on (2 X 3)-dimensional matrices | ® ||J 

with g, f and h chosen randomly from Gi. It is easy to see that the first two 
columns are full-ranked if g ^ Oi and f 7^ 0 | . which holds with probability 

(i - 1 A?) 2 - 


QA-NIZK Construction. We now describe a computationally sound quasi-adap- 
tive NIZK (K 0 ,Ki,P,V) for linear subspace languages { L /^ } with parameters 
sampled from a robust and efficiently witness-samplable distribution V over the 
associated parameter language £par- 

Algorithm Ko. Ko is same as the group generation algorithm for which the XDH 
assumption holds. A = f (q, Gi, G2, Gt, e, g l5 g 2 ) <— Ko(l m ), with (q, Gi,G2,Gr, 
e, Si ) §2) as described above. 

We will assume that the size t X n of the matrix A is either fixed or determined 
by the security parameter m. In general, t and n could also be part of the 
parameter language, and hence t, n could be given as part of the input to CRS 
generator Ki. 

Algorithm Ki. The algorithm Ki generates the CRS as follows. Let A tx " be 
the parameter supplied to Ki . Let s = n — t: this is the number of equations 
in excess of the unknowns. It generates a matrix D tXs with all elements chosen 
randomly from Z q and a single element b chosen randomly from Z q . The common 
reference string (CRS) has two parts CRS P and CRS, which are to be used by 
the prover and the verifier respectively. 


CRS‘ XS := A 


D tx 

b ~ 1 -I sxs 


C rs(»+*) x « 



•82 


Here, I denotes the identity matrix. Note that CRS, is independent of the pa- 
rameter. 

Prover P. Given candidate l = x- A with witness vector x, the prover generates 
the following proof consisting of s elements in Gi : 


p := x • CRSp 

Verifier V. Given candidate l, and a proof p, the verifier checks the following: 

e([r|p],CRS,) =0^ xs 

The security of the above system depends on the DDH assumption in group 
G2. Since G2 is a bilinear group, this assumption is known as the XDH assump- 
tion. These assumptions are standard and are formally described in [IS] . 

Theorem 1. The above algorithms (Ko,Ki,P,V) constitute a computationally 
sound quasi- adaptive NIZK proof system for linear subspace languages {L/^} with 


10 
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parameters A sampled from a robust and efficiently witness-samplable distribu- 
tion V over the associated parameter language Cpar, given any group generation 
algorithm for which the DDH assumption holds for group G 2 . 

Remark. For language members, the proofs are unique as the bottom s rows of 
CRS„ are invertible. 

Proof Intuition. A detailed proof of the theorem can be found in (T5] . Here we 
give the main idea behind the working of the above quasi-adaptive NIZK, and 
in particular the soundness requirement which is the difficult part here. We first 
observe that completeness follows by straightforward bilinear manipulation. Zero 
Knowledge also follows easily: the simulator generates the same CRS as above 
but retains D and b as trapdoors. Now, given a language candidate l, the proof 
is simply p := l ■ | , ^ sxg J . If l is in the language, i.e., it is x - A for some x, 

then the distribution of the simulated proof is identical to the real world proof. 

We now focus on the soundness proof which we establish by transforming the 
system over two games. Let Game Go be the original system. Since V is efficiently 
witness samplable, in Game Gi the challenger generates both A and A A ■ g, . 

Then it computes a rank s matrix j of dimension ( t+s ) X s whose columns 

rw txs i 

form a complete basis for the null-space of A, which means A • j SXs = 0* . 

Now statistically, the CRS in Game Go is indistinguishable from the one where 
we substitute D'+Z) - 1 • W for D, where D' itself is an independent random matrix. 
With this substitution, the CRS ? , and CRS,, can be represented as 



Now we show that if an efficient adversary can produce a “proof” p for which 
the above pairing test holds and yet the candidate l is not in Lj^, then it implies 
an efficient adversary that can break DDH in group G 2 . So consider a DDH game, 
where a challenger either provides a real DDH-tuple (g 2 , b ■ g 2 , r • g 2 ,x = br ■ g 2 ) 
or a fake DDH tuple (g 2 , b- g 2 , r • g 2 ,x = br' ■ g 2 ). Let us partition the Z q matrix 
A as [Aq X *|A^ X s ] and the candidate vector l as [iq 1 **]?, 1 * 8 ] ■ Note that, since 

Ao has rank t, the elements of Iq are ‘free’ elements and To can be extended to 
a unique n element vector l', which is a member of This member vector l ’ 
can be computed as l' := | —Iq ■ wj , nothing W = — Aq 1 A i . The proof of 

l ' is computed as p / := Iq ■ D'. Since both (l, p) and (Z ', p') pass the verification 
equation, we obtain: — li = b(p — p), where Z-, = — Zo • W. In particular 
there exists i € [l,s], such that, l' u — lu = 6(p' — pj ^ 0i. This gives us a 
straightforward test for the DDH challenge: e(Z , li — lu,r ■ g 2 ) = e(p' — p, ; , x) ■ 
This leads to a proof of soundness of the QA-NIZK. 
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Remark. Observe from the proof above that the soundness can be based on 
the following computational assumption which is implied by XDH, which is a 
decisional assumption: 

Definition 1. Consider a generation algorithm Q taking the security parameter 
as input, that outputs a tuple (g,Gi,G2,G;r,e,g 1 ,g 2 ), where Gi,G2 and G t 
are groups of prime order q with generators g 1; g 2 and e(g 1; g 2 ) respectively and 
which allow an efficiently computable Z q -bilinear pairing map e : G1XG2 A G t- 
The assumption asserts that the following problem is hard: Given f, f 6 G 2 , 
output h, h 7 £ Gi, such that h 7 = h b ^ Oi. 

Example: QA-NIZK for a DH tuple. In this example, we instantiate our general 
system to provide a NIZK for a DH tuple, that is a tuple of the form (a: • g, x ■ f) 
for an a priori fixed base (g, f) £ G^. We assume DDH for the group G 2 . 

As in the setup described before, we have A — [g f] . The language is: L = 
{N • A I X e ZJ. 

Now proceeding with the framework, we generate D as [d] and the element b 
where d and b are random elements of Z g . With this setting, the NIZK CRS is: 


I n n 

' b ■ D ' 


bd- g 2 

b -i Jixi =[d-g + 6“ 1 -f], CRS„:= 

_-6-I lxl _ 

■82 = 

g 2 

_-b- g 2 _ 


The proof of a tuple (r, r) with witness r, is just the single element r-(d- g + 
6 _1 • f). In the proof of zero knowledge, the simulator trapdoor is (d, b) and the 
simulated proof of (r, r) is just (d ■ r + 6 _1 • r). 

4 Extensions 

In this section we consider some useful extensions of the concepts and construc- 
tions of QA-NIZK systems. We show how the previous system can be extended 
to include tags. The tags are elements of Z q , are included as part of the proof and 
are used as part of the defining equations of the language. We define a notion 
called split-CRS QA-NIZK system, where the prover and verifier use distinct 
parts of a CRS and we construct a split-CRS system for affine systems. 

Tags. While our system works for any number of components in the tuple (ex- 
cept the first t) being dependent on any number of tags, to simplify the pre- 
sentation we will focus on only one dependent element and only one tag. Also 
for simplicity, we will assume that this element is an affine function of the tag 
(the function being defined by parameters). We can handle arbitrary polynomial 
functions of the tags as well, but we will focus on affine functions here as most 
applications seem to need just affine functions. Then, the languages we handle 
can be characterized as 

i A,a 1 ,a 2 = {(*' [ A I (57 + tag- 4)] , tag) | x € Z* , tag e Z g } 
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where a} xt and a 2 x * are parameters of the language. A distribution is 

still called robust (as in Section [3J| if with overwhelming probability the first t 
columns of A are full-ranked. Write A as [A* xt | A* x ^ n_1_t ^], where without loss 
of generality, A / is non-singular. While the first n — 1 — t components in excess 
of the unknowns, corresponding to A r , can be verified just as in Section [5] for 
the last component we proceed as follows. 

Algorithm Ki. The CRS is generated as: 


CRS; xl := [A, | 

$]-\ 

ri] CRS'- 1 := [A, | 

5j]- 


~b ■ Di 


'6- D 2 J 

CRS^ 2)x1 := 

1 

• g 2 CRS|‘1 2)x1 := 

0 


-b 


0 


where Di and D 2 are random matrices of order f x 1 independent of the matrix 
D chosen for proving the other components. The Z g element b can be re-used 
from the other components. 

Prover. Let l' = x- [ A/ | (aj" + TAG • a.J ) ] . The prover generates the following 
proof for the last component: 

p := x • (CRS p ,o + tag • CRS p ,i) 

Verifier. Given a proof p for candidate l' the verifier checks the following: 

e Q l ' | p ] , CRS„, 0 + tag • CRS„,i) = 0 T 

The size of the proof is 1 element in the group Gi. The proof of completeness, 
soundness and zero-knowledge for this quasi-adaptive system is similar to proof 
in Section |3] and a proof sketch can be found in [T5] . 

Split-CRS QA-NIZK Proofs. We note that the QA-NIZK described in Section [3] 
has an interesting split-CRS property. In a split-CRS QA-NIZK for a distri- 
bution of relations, the CRS generator Ki generates two CRS-es 'f p and ip v , such 
that the prover P only needs ip p , and the verifier V only needs ip v . In addition, 
the CRS ipv is independent of the particular relation R p . In other words the CRS 
generator Ki can be split into two PPTs Kn and K 12 , such that Kn generates 
ip v using just A, and K 12 generates ip p using p and a state output by Kn. The 
key generation simulator Si is also split similarly. The formal definition is given 
in [15]. 

In many applications, split-CRS QA-NIZKs can lead to simpler constructions 
(and their proofs) and possibly shorter proofs. 

Split-CRS QA-NIZK for Affine Spaces. Consider languages that are affine spaces 
L A,a = {(*- A + 3 ) eG i I xeZ*} 

The parameter language £par just specifies A and a. A distribution over £par is 
called robust if with overwhelming probability the left most t X t sub-matrix of A 
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is non-singular (full-ranked). If a is given as part of the verifier CRS, then a QA- 
NIZK for distributions over this class follows directly from the construction in 
Section[3l However, that would make the QA-NIZK non split-CRS. We now show 
that the techniques of Section [3] can be extended to give a split-CRS QA-NIZK 
for (robust and witness-samplable) distributions over affine spaces. 

The common reference string (CRS) has two parts ip p and ip v which are to 
be used by the prover and the verifier respectively. The split-CRS generator Kn 
and K 12 work as follows. Let s = n — t : this is the number of equations in excess 
of the unknowns. 

Algorithm Kn. The verifier CRS generator first generates a matrix D* x,s with 
all elements chosen randomly from Z g and a single element b chosen randomly 
from Z q . It also generates a row vector d at random from Z ? . Next, it com- 
putes 


CR S («+*)X* 



•g 2 


f lxs := e(g 1 , & ■ d • g 2 ) 


The verifier CRS is the matrix CRS,, and f. 
Algorithm K 12 . The prover CRS generator K 12 generates 



The (prover) CRS ip p is just the matrix CRS P . 

Prover. Given candidate (x • A + a) with witness vector x, the prover generates 
the following proof: 

p:= [x | 1] -CRSp 

Verifier. Given a proof p of candidate l, the verifier checks the following: 


e([r|p],CRS t ,)^f 


We provide a proof sketch in [TS] . The split-CRS QA-NIZK for affine spaces 
also naturally extends to include tags as described before in this section. 


5 Applications 

In this section we mention several important applications of quasi-adaptive NIZK 
proofs. Before we go into the details of these applications, we discuss the general 
applicability of quasi-adaptive NIZKs. Recall in quasi-adaptive NIZKs, the CRS 
is set based on the language for which proofs are required. In many applications 
the language is set by a trusted party, and the most obvious example of this is 
the trusted party that sets the CRS in some UC applications, many of which 
have UC realizations only with a CRS. Another obvious example is the (H)IBE 
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trusted party that issues secret keys to various identities. In many public key 
applications, the party issuing the public key is also considered trusted, i.e. 
incorruptible, as security is defined with respect to the public key issuing party 
(acting as challenger). Thus, in all these settings if the language for which proofs 
are required is determined by a incorruptible party, then that party can also 
issue the QA-NIZK CRS based on that language. It stands to reason that most 
languages for which proofs are required are ultimately set by an incorruptible 
party (at least as far as the security definitions are concerned), although they may 
not be linear subspaces, and can indeed be multi-linear or even quadratic. For 
example, suppose a potentially corruptible party P wants to (NIZK) prove that 
x e L p , where L p is a language that it generated. However, this proof is unlikely 
to be of any use unless it also proves something about L p , e.g., that p itself is in 
another language, say L' . In some applications, potentially corruptible parties 
generate new linear languages using random tags. However, the underlying basis 
for these languages is set by a trusted party, and hence the NIZK CRS for the 
whole range of tag based languages can be generated by that trusted party based 
on the original basis for these languages. 

Adaptive UC Commitments in the Erasure Model. The SXDH-based commit- 
ment scheme from m requires the following quasi-adaptive NIZK proof (see m 
for details) 

{(R, S, T) | 3r : R = r • g, S = r ■ h, T = r • (di + tag • ei)} 

with parameters h, di, ei (chosen randomly), which leads to a UC commitment 
scheme with commitment consisting of 3 Gi elements, and a proof consisting of 
two G 2 elements. Under DLIN, a similar scheme leads to a commitment consist- 
ing of 4 elements and an opening of another 4 elements, whereas m stated a 
scheme using Groth-Sahai NIZK proofs requiring (5 + 16) elements. More details 
can be found in m- 

One-time (Relatively) Simulation-Sound NIZK for DDH and Others. In [T3] 
it was shown that for linear subspace languages, such as the DDH or DLIN 
language, or the language showing that two El-Gamal encryptions are of the 
same message HBI321 . the NIZK proof can be made one-time simulation sound 
using a projective hash proof [7] and proving in addition that the hash proof is 
correct. For the DLIN language, this one-time simulation sound proof (in Groth- 
Sahai system) required 15 group elements, whereas the quasi-adaptive proof in 
this paper leads to a proof of size only 5 group elements. 

Signatures. We will now show a generic construction of existentially unforgeable 
signature scheme (against adaptive adversaries) from labeled CCA2-encryption 
schemes and split-CRS QA-NIZK proof system (as defined in Section 0j) for 
a related language distribution. This construction is a generalization of a sig- 
nature scheme from [5] which used (fully) adaptive NIZK proofs and required 
constructions based on groups in which the CDH assumption holds. 
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Let £ = (KeyGen, Enc, Dec) be a labeled CCA-encryption scheme on messages. 
Let X m be any subset of the message space of £ such that 1 / 1 X rn | is negligi- 
ble in the security parameter m. Consider the following class of (parametrized) 
languages { L p }: 

L p = {(c, M) | 3r : c = EnCp^u; r; M)} 

with parameter p = (u, pk). The notation EnCpj < (u; r; M) means that u is en- 
crypted under public key pk with randomness r and label M. Consider the 
following distribution D on the parameters: u is chosen uniformly at random 
from X m and pk is generated using the probabilistic algorithm KeyGen of £ on 
l m (the secret key is discarded). Note we have an ensemble of distributions, one 
for each value of the security parameter, but we will suppress these details. 

Let Q = (K 0 , (Kn, K 12 ), P, V) be a split-CRS QA-NIZK for distribution D on 
{Lp}. Note that the associated parameter language £par is just the set of pairs 
(u, pk), and V specifies a distribution on £par- 

Now, consider the following signature scheme S. 

Key Generation. On input a security parameter m, run K 0 (l m ) to get A. Let 
£.pk be generated using KeyGen of £ on l m (the secret key sk is discarded). 
Choose u at random from X m . Let p = (u, £.pk). Generate by running Kn 
on A (it also generates a state s). Generate ip p by running K 12 on (A, p) and state 
s. The public key <S.pk of the signature scheme is then tp v . The secret key <S.sk 
consists of (u,£.pk, ip p ). 

Sign. The signature on M just consists of a pair (c, n), where c is an ^-encryption 
of u with label M (using public key £.pk and randomness r), and n is the QA- 
NIZK proof generated using prover P of Q on input (ip p , (c, M), r). Recall r is 
the witness to the language member (c, M ) of L p (and p = (u, £.pk)). 

Verify. Given the public key <S.pk (= ip v ), and a signature (c, 7 r) on message M, 
the verifier uses the verifier V of Q and outputs V(^„, ( c,M),n ). 

Theorem 2. If £ is a labeled CCA 2- encryption scheme and Q is a split-CRS 
quasi- adaptive NIZK system for distribution V on class of languages {L p } de- 
scribed above, then the signature scheme described above is existentially unforge- 
able under adaptive chosen message attacks. 

The theorem is proved in [TS]. It is worth remarking here that the reason 
one can use a quasi-adaptive NIZK here is because the language L p for which 
(multiple) NIZK proof(s) is required is set (or chosen) by the (signature scheme) 
key generator, and hence the key generator can generate the CRS for the NIZK 
after it sets the language. The proof of the above theorem can be understood 
in terms of simulation-soundness. Suppose the above split-CRS QA-NIZK was 
also unbounded simulation-sound. Then, one can replace the CCA2 encryption 
scheme with just a CPA-encryption scheme, and still get a secure signature 
scheme. A proof sketch of this is as follows: an Adversary B is only given ip v 
(which is independent of parameters, including u). Further, the simulator for the 
QA-NIZK can replace all proofs by simulated proofs (that do not use witness r 
used for encryption). Next, one can employ CPA-security to replace encryptions 
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of u by encryptions of 1. By unbounded simulation soundness of the QA-NIZK 
it follows that if B produces a verifying signature then it must have produced 
an encryption of u. However, the view of B is independent of u, and hence its 
probability of forging a signature is negligible. 

However, the best known technique for obtaining efficient unbounded simula- 
tion soundness itself requires CCA2 encryption (see [5]), and in addition NIZK 
proofs for quadratic equations. On the other hand, if we instantiate the above 
theorem with Cramer-Shoup encryption scheme, we get remarkably short sig- 
natures (in fact the shortest signatures under any static and standard assump- 
tion). The Cramer-Shoup encryption scheme PK consists of g, f,k, d,e chosen 
randomly from Gi , along with a target collision-resistant hash function H (with 
a public random key). The set X from which u is chosen is just the whole group 
Gi . Then an encryption of u is obtained by picking r at random, and obtaining 
the tuple 


(R = r ■ g, S = r ■ f , T = u + r • k, H = r ■ ( d + tag • e)) 

where tag = R(R, S, T , M). It can be shown that it suffices to hide u with the 
hash proof H (although one has to go into the internals of the hash-proof based 
CCA2 encryption; see Appendix in [13]). Thus, we just need a (split-CRS) QA- 
NIZK for the tag-based affine system (it is affine because of the additive constant 
u). There is one variable r, and three equations (four if we consider the original 
CCA-2 encryption) Thus, we just need (3 — 1) * 1 (= 2) proof elements, leading to 
a total signature size of 5 elements (i.e. R, S,u + H, and the two proof elements) 
under the SXDH assumption. 


Dual-System Fully Secure IBE. It is well-known that Identity Based Encryption 
(IBE) implies signature schemes (due to Naor), but the question arises whether 
the above signature scheme using Cramer-Shoup CCA2-encryption and the re- 
lated QA-NIZK can be converted into an IBE scheme. To achieve this, we take 
a hint from Naor’s IBE to Signature Scheme conversion, and let the signatures 
(on identities) be private keys of the various identities. The verification of the 
QA-NIZK from Section |3] works by checking e Q/ | pj , CRS, j = 0^ xs (or more 
precisely, e Qz | pj , CRS,,j) = f for the affine language). However, there are two 
issues: (1) CRS, needs to be randomized, (2) there are two equations to be veri- 
fied (which correspond to the alternate decryption of Cramer-Shoup encryption, 
providing implicit simulation-soundness). Both these problems are resolved by 
first scaling CRS,, by a random value s, and then taking a linear combination 
of the two equations using a public random tag. The right hand side s ■ f can 
then serve as secret one-time pad for encryption. Rather than being a provable 
generic construction, this is more a hint to get to a really short IBE. We give 
the construction in Appendix [A] and a complete proof in [TS] . It shows an IBE 
scheme under the SXDH assumption where the ciphertext has only four group 
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(Gi) elements plus a Z g -tag, which is the shortest IBE known under standard 
static assumption^. 

Publicly- Verifiable CCA2 Fully-Secure IBE. We can also extend our IBE scheme 
above to be publicly-verifiable CCA2-secure mm- Public verifiability is an in- 
formal but practical notion: most CCA2-secure schemes have a test of well- 
formedness of ciphertext, and on passing the test a CPA-secure scheme style 
decryption suffices. However, if this test can be performed publicly, i.e. without 
access to the secret key, then we call the scheme publicly-verifiable. While there 
is a well known reduction from hierarchical IBE to make an IBE scheme CCA2- 
secure g], that reduction does not make the scheme publicly-verifiable CCA2 
in a useful manner. In the IBE setting, publicly-verifiable also requires that it 
be verifiable if the ciphertext is valid for the claimed identity. This can have 
interesting applications where the network can act as a filter. We show that our 
scheme above can be extended to be publicly-verifiable CCA2-fully-secure IBE 
with only two additional group elements in the ciphertext (and two additional 
group elements in the keys). We give the construction in Appendix iBl and a com- 
plete proof in m- The IBE scheme above has four group elements (and a tag), 
where one group element serves as one-time pad for encrypting the plaintext. 
The remaining three group elements form a linear subspace with one variable 
as witness and three integer tags corresponding to: (a) the identity, (b) the tag 
needed in the IBE scheme, and (c) a 1-1 (or universal one-way) hash of some 
of the elements. We show that if these three group elements can be QA-NIZK 
proven to be consistent, and given the unique proof property of our QA-NIZKs, 
then the above IBE scheme can be made CCA2-secure - the dual-system already 
has implicit simulation-soundness as explained in the signature scheme above, 
and we show that this QA-NIZK need not be simulation-sound. Since, there are 
three components, and one variable (see the appendix for details), the QA-NIZK 
requires only two group elements under SXDH. 
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A Dual System IBE under SXDH Assumption 

For ease of reading, we switch to multiplicative group notation in the following. 
Setup: The authority uses a group generation algorithm for which the SXDH 
assumption holds to generate a bilinear group (Gi,G 2 ,Gt) with g x and g 2 as 
generators of Gi and G2 respectively. Assume that Gi and G2 are of order q. and 
let e be a bilinear pairing on Gi x G2. Then it picks c at random from Z q , and 
sets f = g 2 . It further picks A\, A 2 , A3, A4, b, d, e, u from Z q , and publishes 
the following public key PK: 

gi, g b i, Vi = gT Al b+d , v 2 = gr 42 ' 6+e , v 3 = gr A3 ' 6+c , and k = e( gl , g 2 )-^' 6 +“. 
The authority retains the following master secret key MSK: g 2 , f = (g 2 ), and 
Ai, A 2 , A3, A4, d, e, u. 

Encrypt(PK, i, M). The encryption algorithm chooses s and TAG at random 
from Z q . It then blinds M as Co = M ■ k s , and also creates 

C 1 =g s 1 ,C 2 = g i s , C 3 = vj • v 2 ' 5 • vl AG s 

and the ciphertext is C = (Co, Ci, C2, C 3 , tag). 

KeyGen(MSK, i). The authority chooses r at random from Z q and creates 

R = g^S = g ^,T = S t Hd+le K Wi = g- A *- r < Al+i - A >\w 2 =g 2 rAs 

as the secret key K % for identity i. 

Decrypt (A 7 t , C). Let tag be the tag in C. Obtain 

e(Ci, 5 TAG ■ T) ■ e(C 2 , W^. ■ W 2 TAG ) 
e(C 3 , R) 


and output Co / n. 

Theorem 3. Under the SXDH Assumption, the above scheme is a fully-secure 
IBE scheme. 
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B Publicly Verifiable CCA2-IBE under SXDH 
Assumption 

Setup. The authority uses a group generation algorithm for which the SXDH 
assumption holds to generate a bilinear group (Gi,G2,Gt) with g 2 and g 4 as 
generators of Gi and G2 respectively. Assume that Gi and G2 are of order q, 
and let e be a bilinear pairing on Gi x G2. Then it picks c at random from Z q , 
and sets f = g 2 . It further picks A\, A 2 , A3, At, A5, b, d, e, u, z from Z q , and 
publishes the following public key PK: 

gi, gi, Vi = gi Al ' b+d , V 2 = gi A2 ' b+e , V 3 = gr^ 3 ' b+c , V 4 = gi A4 ' b+z , and k = 
e{gi,g 2 )~ A5 ' b+u - 

Consider the language: 

L = «Ci, C 2 , C 3 , t, tag, h)\3s:C 1 = g{, C 2 = g bs , C 3 = vj • v|' s • v™ 3 '* • < s } 

It also publishes the QA-NIZK CRS for the language L (which uses tags i, tag 
and h ). It also publishes a 1 - 1 , or Universal One-Way Hash function (UOWHF) 
H. The authority retains the following master secret key MSK: g 2 , f (= g 2 ), 
and Ai, A 2 , A 3 , At, A 5 , d, e, u, z. 

Encrypt(PK, i, M). The encryption algorithm chooses s and TAG at random 
from Z q . It then blinds M as Co = M ■ k s , and also creates 

Ci = g{, C 2 = «?■% P 3 = vf • 4 s ■ y| A& ' s • vt, 
where h = H(Co, C\, C 2 , tag, i). The ciphertext is then C = (Co , C \ , C 2 , C3 , 
tag, Pt, p 2 ), where (Pi, p 2 ) is a QA-NIZK proof that (Co, C\,C 2 , C3, i, tag, h ) G 
L. 

KeyGen(MSK, i). The authority chooses r at random from Z q and creates 

r = g 2 , Si = g ^ c , s 2 = g r,T = 

Wi = g -A,-r.(4 1 + t.4 2 ), ^ = g -r^ m = g -r.A 4 

as the secret key K % for identity i. 

Decrypt(JsTj, C). Let tag be the tag in C. Let h = H(Co, Ci,C 2 , tag, i). First 
(publicly) verify that the ciphertext satisfies the QA-NIZK for the language 
above. Then, obtain 

e(Ci, 5 4 ag • S% ■ T) ■ e(C 2 ,Wi ■ W 2 TAG • W 3 ) 

K ~ e(C 3 , R) 

and output Co/ k. If the QA-NIZK does not verify, output _L. 

This public- verifiability of the consistency test is informally called the publicly- 
verifiable CCA 2 security. 

Theorem 4. Under the SXDH Assumption, the above scheme is a CCA2 fully- 
secure IBE scheme. 



Constant-Round Concurrent Zero Knowledge 
in the Bounded Player Model 

Vipul Goyal 1 , Abhishek Jain 2 , Rafail Ostrovsky 3 , Silas Richelson 4 , 
and Ivan Visconti 5 

1 Microsoft Research, India 
vipulOmicrosof t . com 
2 MIT and Boston University, USA 
abhishek@csail.mit . edu 

3 UCLA, USA 
raf ailScs .ucla. edu 

4 UCLA, USA 
sirichel@math.ucla.edu 

5 University of Salerno, Italy 
visconti@dia.unisa. it 


Abstract. In [18] Goyal et al. introduced the bounded player model for 
secure computation. In the bounded player model, there are an a pri- 
ori bounded number of players in the system, however, each player may 
execute any unbounded (polynomial) number of sessions. They showed 
that even though the model consists of a relatively mild relaxation of 
the standard model, it allows for round-efficient concurrent zero knowl- 
edge. Their protocol requires a super-constant number of rounds. In this 
work we show, constructively, that there exists a constant-round concur- 
rent zero-knowledge argument in the bounded player model. Our result 
relies on a new technique where the simulator obtains a trapdoor corre- 
sponding to a player identity by putting together information obtained 
in multiple sessions. Our protocol is only based on the existence of a 
collision-resistance hash-function family and comes with a “straight-line” 
simulator. 

We note that this constitutes the strongest result known on constant- 
round concurrent zero knowledge in the plain model (under well accepted 
relaxations) and subsumes Barak’s constant-round bounded concurrent 
zero-knowledge result. We view this as a positive step towards getting 
constant round fully concurrent zero-knowledge in the plain model, with- 
out relaxations. 

Keywords: concurrent zero knowledge, straight-line simulation, 
bounded player model. 


1 Introduction 

The notion of a zero-knowledge proof m is central in cryptography, both for 
its conceptual importance and for its wide ranging applications to the design of 
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secure cryptography protocols. Initial results for zero-knowledge were in the so 
called stand-alone setting where there is a single protocol execution happening 
in isolation. 

The fact that on the Internet an adversary can control several players mo- 
tivated the notion of concurrent zero knowledge [13] (cZK). Here the prover is 
simultaneously involved in several sessions and the scheduling of the messages is 
coordinated by the adversary who also keeps control of all verifiers. Concurrent 
zero knowledge is much harder to achieve than zero knowledge. Indeed, while we 
know how to achieve zero- knowledge in 4 rounds, a sequence of results [2113316] 
increased the lower bound on the round complexity of concurrent zero-knowledge 
with black-box simulation to almost logarithmic in the security parameter. In 
the meanwhile, the upper bound has been improved and now almost matches 
the logarithmic lower bound |31l20l30| . After almost a decade of research on 
this topic, the super-logarithmic round concurrent zero-knowledge protocol of 
Prabhakaran et al. [30] remains the best known in terms of round complexity. 

Some hope for a better round complexity started from the breakthrough result 
of Barak [T] where non-black-box simulation under standard assumptions was 
proposed. His results showed how to obtain bounded-concurrent zero knowl- 
edge in constant rounds. This refers to the setting where there is an a priori 
fixed bound on the total number of concurrent executions (and the protocol 
may become completely insecure if the actual number of sessions exceed this 
bound). Unfortunately, since then, the question of achieving sub- logarithmic 
round complexity with unbounded concurrency using non-black-box techniques 
has remained open, and represents one of the most challenging open questions 
in the study of zero-knowledge protocols!)] 

Bounded player model. Recently, Goyal, Jain, Ostrovsky, Richelson and Visconti 
m introduced the so called bounded player model. In this model, it is only 
assumed that there is an a-priori (polynomial) upper-bound on the total number 
of players that may ever participate in protocol executions. There is no setup 
stage, or, trusted party, and the simulation must be performed in polynomial 
time. While there is a bound on the number of players, any player may join in 
at any time and may be subsequently involved in any unbounded (polynomial) 
number of concurrent sessions. Since there is no a priori bound on the number of 
sessions, it is a strengthening of the bounded-concurrency model used in Barak’s 
result. The bounded player model also has some superficial similarities to the 
bare-public-key model of [5] which is discussed later in this section. 

As an example, if we consider even a restriction to a single verifier that runs 
an unbounded number of sessions, the simulation strategy of [I] breaks down 
completely. Goyal et al. [H] gave a w(l)-round concurrent zero knowledge pro- 
tocol in the bounded player model. The technique they proposed relies on the 

1 In this paper, we limit our discussion to results which are based on standard 
complexity-theoretic and number-theoretic assumptions. We note that constant 
round concurrent zero-knowledge is known to exist under non-standard assump- 
tions such as a variation of the (non-falsifiable) knowledge of exponent assumption 
[19| or the existence of P-certificates [8] . 
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fact that the simulator has several choices in every sessions on where to spend 
computation trying to extract a trapdoor, and, its running time is guaranteed 
to be polynomial as long as the number of such choices is super-constant. Their 
technique fails inherently if constant round-complexity is desired. 

We believe the eventual goal of achieving round efficient concurrent zero- 
knowledge (under accepted assumptions) is an ambitious one. Progress towards 
this goal would not only impact how efficiently one can implement zero-knowledge 
(in the network setting), but also, will improve various secure computation pro- 
tocol constructions in this setting (as several secure computation protocols use, 
e.g., PRS preamble [3D] for concurrent input extraction). Bounded player model 
is somewhere between the standard model (where the best known protocols re- 
quire super-logarithmic number of rounds), and, the bounded concurrency model 
(where constant round protocols are known). We believe the study of round com- 
plexity of concurrent zero-knowledge in the bounded player model might shed 
light on how to construct such protocols in the standard model as well. 

Our Results. In this work, we give a constant-round protocol in the bounded 
player (BP) model. Our constructions inherently relies on non-black-box simu- 
lation. The simulator for our protocol does not rely on rewinding techniques and 
instead works in a “straight-line” manner (as in Barak &)■ Our construction is 
only based on the existence of a collision-resistant hash-function family. 
Theorem 1. Assuming the existence of a collision-resistance hash- function fam- 
ily, there exists a constant round concurrent zero-knowledge argument system 
with concurrent soundness in the bounded player model. 

We note that this constitutes the strongest result known on constant-round zero- 
knowledge in the concurrent setting (in the plain model). It subsumes Barak’s 
result: now the total number of sessions no longer needs to be bounded; only 
the number of new players starting the interaction with the prover is bounded. 
A player might join in at anytime and may subsequently be involved in any 
unbounded (polynomial) number of sessions. 

We further note that, as proved by Goyal et al. [T5] , unlike previously studied 
relaxations of the standard model (e.g., bounded number of sessions, timing 
assumptions, super- polynomial simulation), concurrent- secure computation is 
still impossible to achieve in the bounded player model. This gives evidence 
that the BP model is “closer” to the standard model than previously studied 
models, and study of this model might shed light on constructing constant-round 
concurrent zero-knowledge in the standard model as well. Moreover, despite 
the impossibility of concurrent-secure computation, techniques developed in the 
concurrent zero-knowledge literature have found applications in other areas in 
cryptography, including resettable security [5], non-malleability im. and even 
in proving black-box lower bounds m- 

1.1 Technical Overview 

In this section, first, we recall some observations by Goyal et al jT3] regarding why 
simple approaches to extend the construction of Barak [I] to the bounded player 
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model are bound to fail. We also recall the basic idea behind the protocol of [TS] ■ 
Armed with this background, we then proceed to discuss the key technical ideas 
behind our constant round cZK protocol in the bounded player model. Initial 
parts of this section are borrowed verbatim from [1 8| . 

Why natural approaches fail. Recall that in the bounded player model, the only 
assumption is that the total number of players that will ever be present in the 
system is a priori bounded. Then, as observed by Goyal et al [IS] , the black-box 
lower-bound of Canetti et al. [B] is applicable to the bounded player model as 
well. Thus, it is clear that we must resort to non-black-box techniques. Now, a 
natural approach to leverage the bound on the number of players is to associate 
with each verifier Vi a public key phi and then design an FLS-style protocol [IB] 
that allows the ZK simulator to extract, in a non-black-box manner, the secret 
key ski of the verifier and then use it as a “trapdoor” for “easy” simulation. 
The key intuition is that once the simulator extracts the secret key ski of a 
verifier Vi, it can perform easy simulation of all the sessions associated with V t . 
Then, since the total number of verifiers is bounded, the simulator will need 
to perform non-black-box extraction only an a priori bounded number of times 
(once for each verifier) , which can be handled in a manner similar to the setting 
of bounded-concurrency [T] . 

Unfortunately, as observed by Goyal et al. [18], the above intuition is mis- 
leading. In order to understand the problem with the above approach, let us 
first consider a candidate protocol more concretely. In fact, it suffices to focus 
on a preamble phase that enables non-black-box extraction (by the simulator) 
of a verifier’s secret key since the remainder of the protocol can be constructed 
in a straightforward manner following the FLS approach. Now, consider the fol- 
lowing candidate preamble phase (using the non-black-box extraction technique 
of [3]): first, the prover and verifier engage in a coin- tossing protocol where the 
prover proves “honest behavior” using a Barak-style non-black-box ZK protocol 
[I] . Then, the verifier sends an encryption of its secret key under the public key 
that is determined from the output of the coin- tossing protocol [TB] ■ 

In order to analyze this protocol, we will restrict our discussion to the simpli- 
fied case where only one verifier is present in the system (but the total number of 
concurrent sessions are unbounded). At this point, one may immediately object 
that in the case of a single verifier identity, the problem is not interesting since 
the bounded player model is identical to the bare-public key model, where one 
can construct four-round cZK protocols using rewinding based techniques. How- 
ever, simulation techniques involving rewinding do not “scale” well to the case of 
polynomially many identities (unless we use a large number of rounds) and fail. 
In contrast, our simulation approach is “straight-line” for an unbounded number 
of sessions and scales well to a large bounded number of identities. Therefore, in 
the forthcoming discussion, we will restrict our discussion to straight-line simu- 
lation. In this case, we find it instructive to focus on the case of a single identity 
to explain the key issues and our ideas to resolve them. 

We now turn to analyze the candidate protocol. Now, following the intuition 
described earlier, one may think that the simulator can simply cheat in the 
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coin-tossing protocol in the “inner-most” session in order to extract the secret 
key, following which all the sessions can be simulated in a straight-line manner, 
without performing any additional non-black-box simulation. Consider, however, 
the following adversarial verifier strategy: the verifier schedules an unbounded 
number of sessions in such a manner that the coin-tossing protocols in all of these 
sessions are executed in a “nested” manner. Furthermore, the verifier sends the 
ciphertext (containing its secret key) in each session only after all the coin-tossing 
protocols across all sessions are completed. Note that in such a scenario, the 
simulator would be forced to perform non-black-box simulation in an unbounded 
number of sessions. Unfortunately, this is a non-trivial problem that we do not 
know how to solve. 

The approach of Goyal et al. fTfflj . In an effort to bypass the above problem, 
Goyal et al. use multiple (w(l), to be precise) preamble phases (instead of only 
one), such that the simulator is required to “cheat” in only one of these pream- 
bles. This, however, immediately raises a question: in which of the w(l) pream- 
bles should the simulator cheat? This is a delicate question since if, for example, 
we let the simulator pick one of preambles uniformly at random, then with 
non-negligible probability, the simulator will end up choosing the first preamble 
phase. In this case, the adversary can simply perform the same attack as it did 
earlier playing only the first preamble phase, but for many different sessions so 
that the simulator will still have to cheat in many of them. Indeed, it would seem 
that any randomized oblivious simulation strategy can be attacked in a similar 
manner by simply identifying the first preamble phase where the simulator would 
cheat with a non-negligible probability. 

The main idea in m is to use a specific probability distribution such that 
the simulator cheats in the first preamble phase with only negligible probabil- 
ity, while the probability of cheating in the later preambles increases gradually 
such that the “overall” probability of cheating is 1 (as required). Further, the 
distribution is such that the probability of cheating in the i th preamble is less 
than a fixed polynomial factor of the total probability of cheating in one of the 
previous i — 1 blocks. This allows them (by a careful choice of parameters) to 
ensure that the probability of the simulator failing in more than a given poly- 
nomially bounded number of sessions w.r.t. any given verifier is negligible (and 
then rely on the techniques from the bounded-concurrency model [I] to handle 
the bounded number of non- black-box simulations). 

Our Construction. The techniques used in our work are quite different and un- 
related to the techniques in the work of Goyal et al. [T5] . As illustrated in the 
discussion above, the key issue is the following. Say that a slot of the protocol 
completes. Then, the simulator starts the non-black-box simulation and com- 
putes the first “heavy” universal argument message, and, sends it across. How- 
ever, before the simulator can finish this simulation successfully (and somehow 
learn a trapdoor from the verifier which can then be used to complete other ses- 
sions without non-black-box simulation), the verifier switches to another session. 
Then, in order to proceed, the simulator would have to perform non-black-box 
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simulation and the heavy computation again (resulting in the number of ses- 
sions where non-black-box simulation is performed becoming unbounded). So 
overall, the problem is the “delay” between the heavy computation, and, the 
point at which the simulator extracts the verifier trapdoor (which can then be 
used to quickly pass through other sessions with this particular verifier without 
any heavy computation or non-black-box simulation). 

Our basic approach is to “construct the trapdoor slowly as we go along” : have 
any heavy computation done in any session (with this verifier) contribute to the 
construction of a trapdoor which can then be used to quickly pass through other 
sessions. To illustrate our idea, we shall focus on the case of a single verifier as 
before. The description below is slightly oversimplified for the sake of readability. 

To start with, in the very first session, the verifier is supposed to choose a key 
pair of a signature scheme (this key pair remains the same across all sessions 
involving this verifier). As in Barak’s protocol pQ, we will just have a single slot 
followed by a universal argument (UA). However, now once a slot is complete, 
the verifier is required to immediately send a signature^ on the transcript of 
the slot (i.e. , on the prover commitment, and, the verifier random string) to the 
prover. This slot now constitutes a “hard statement” certified by the verifier: 
it could be used by the prover in any session (with this verifier). If the prover 
could prove that he has a signed slot such that the machine committed to in 
this slot could output the verifier random string in this slot, the verifier would 
be instructed to accept. Thus, the simulator would now simply take the first 
slot that completes (across all sessions), and, would prove the resulting “hard 
statement” in the universal arguments of all the sessions. This would allow him 
to presumably compute the required PCP only once and use it across all sessions. 
Are we done? Turns out that the answer is no. 

Even if the prover is executing the UA corresponding to the same slot (on 
which he has obtained a signature) in every session, because of the interactive 
nature of UAs, the (heavy) computation the prover does in a session cannot 
be entirely used in another session. This is because the challenge of the verifier 
would be different in different sessions. To solve this problem and continue the 
construction of a single trapdoor (useful across all sessions), we apply our ba- 
sic idea one more time. The prover computes and sends the first UA message. 
The verifier is required to respond with a random challenge and a signature on 
the UA transcript so far. The prover can compute the final UA message, and, the 
construction of the trapdoor is complete: the trapdoor constitutes of a signed 
slot, an accepting UA transcript (proving that the machine committed to in the 
slot indeed outputs the random string in that slot), and, a signature on the first 
two UA messages (proving that the challenge was indeed generated by the veri- 
fier after getting the first UA message). To summarize, the simulator would use 
the following two sessions for the construction of the trapdoor: the first session 


Signatures of committed messages computed by a verifier where previously used 
in [12] to allow the simulator to get through rewindings one more signature in order 
to cheat in the main thread. Here instead we insist with straight-line simulation. 
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where a slot completes, and, the first session where the verifier sends the UA 
random challenge. 

The above idea indeed is oversimplified and ignores several problems. Firstly, 
since an honest prover executes each concurrent session oblivious of others, any 
correlations in the prover messages across different sessions (in particular, send- 
ing the same UA first message) would lead to the simulated transcript being 
distinguishable from the real one. Furthermore, the prover could be proving a 
different overall statement to the verifier in every session (and hence even a UA 
first message cannot be reused across different sessions). The detailed description 
of our construction is given in Section [3] 


1.2 Related Work 

Bare public key and other related models. The bare public key model was pro- 
posed in [5] where, before any interaction starts, every player is required to 
declare a public key and store it in a public file (which never changes once 
the sessions start). In this model it is known how to obtain constant-round 
concurrent zero knowledge with concurrent soundness under standard assump- 
tions [13135136151] . This model has also been used for constant-round concurrent 
non-malleable zero knowledge m and various constant-round resettable and 
simultaneously resettable protocols [221391111911013813717] . 

As discussed in [T8] , the crucial restriction of the BPK model is that all players 
who wish to ever participate in protocol executions must be fixed during the pre- 
processing phase, and new players cannot be added “on-the-fly” during the proof 
phase. We do not make such a restriction in our work and, despite superficial 
resemblance, the techniques useful in constructing secure protocols in the BPK 
model have limited relevance in our setting. In particular, constant round cZK is 
known to exist in the BPK model using only black-box simulation, while in our 
setting, non-black-box techniques are necessary to achieve sublogarithmic-round 
cZK. 

In light of the above discussion, since the very premise of the BPK model 
(that all players are fixed ahead of time and declare a key) does not hold in the 
bounded player model, we believe that the bounded player model is much closer 
in spirit (as well as technically) to the bounded concurrency model of Barak. 
The bounded player model is a strict generalization of the bounded concurrency 
model. Thus, our constant-round construction is the first strict improvement 
to Barak’s bounded concurrent ZK protocol. We stress that we improve the 
achieved security under concurrent composition, still under standard assump- 
tions and without introducing any setup/weakness. Summing up, ours is a con- 
struction which is the closest known to achieving constant-round concurrent zero 
knowledge in the plain model. 

Round efficient concurrent zero-knowledge is known in a number of other 
models as well (which do not seem to be directly relevant to our setting) such 
as the common-reference string model, the super-polynomial simulation model, 
etc. We refer the reader to [T8] for a more detailed discussion. 


28 


V. Goyal et al. 


2 Preliminaries and Definitions 

Notation. We will use the symbol “||” to denote the concatenation of two strings 
appearing respectively before and after the symbol. 

2.1 Bounded Player Model 

We first recall the bounded player model for concurrent security, as introduced 
in [18]. In the bounded player model, there is an a-priori (polynomial) upper 
bound on the total number of player that will ever be present in the system. 
Specifically, let n denote the security parameter. Then, we consider an upper 
bound N = poly(n) on the total number of players that can engage in concurrent 
executions of a protocol at any time. We assume that each player Pj (i e N) has 
an associated unique identity id,;, and that there is an established mechanism to 
enforce that party Pj uses the same identity idj in each protocol execution that it 
participates in. Note, however, that such identities do not have to be established 
in advance. In particular, new players can join the system with their own (new) 
identities, as long as the number of players does not exceed N. We stress that 
there is not bound on the number of protocol executions that can be started by 
each party. 

The bounded player model is formalized by means of a functionality F^ p 
that registers the identities of the player in the system. Specifically, a player 
Pi that wishes to participate in protocol executions can, at any time, register 
an identity idj with the functionality F^ p . The registration functionality does 
not perform any checks on the identities that are registered, except that each 
party Pj can register at most one identity idj, and that the total number of 
identity registrations are bounded by N. In other words, F^ p refuses to register 
any new identities once N number of identities have already been registered. 
The functionality Ffi[ is formally defined in Figured! 


Functionality 

Ffy initializes a variable count to 0 and proceeds as follows. 

— Register commands: Upon receiving a message (register, sid, idj) from some 
party Pi, the functionality checks that no pair (Pi, id') is already recorded and 
that count < N. If this is the case, it records the pair (Pj, id,) and sets count = 
count + 1. Otherwise, it ignores the received message. 

— Retrieve commands: Upon receiving a message (retrieve, sid, Pj) from some 
party Pj or the adversary A, the functionality checks if some pair (Pj,id») is 
recorded. If this the case, it sends (sid, Pi, idj) to Pj (or A). Otherwise, it returns 
(sid, Pi, _L). 


Fig. 1. The Bounded Player Functionality F b ( 
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In our constructions we will explicitly work in the setting where the identity 
of each party is a tuple ( h , vk), where h <— H n is a hash function chosen from a 
family H„ of collision resistant hash functions, and vk is a verification key for a 
signature scheme. 


2.2 Concurrent Zero Knowledge in Bounded Player Model 

In this section, we formally define concurrent zero knowledge in the bounded 
player model. The definition given below, is an adaptation of the one of [501 
the bounded player model, by also considering non-black-box simulation. Some 
of the text below is taken verbatim from [30] . 

Let ppt denote probabilistic-polynomial time. Let {P,V) be an interactive 
argument for a language L. Consider a concurrent adversarial verifier V* that, 
given input x £ L, interacts with an unbounded number of independent copies 
of P (all on the same common input x and moreover equipped with a proper 
witness w), without any restriction over the scheduling of the messages in the 
different interactions with P. In particular, V* has control over the scheduling 
of the messages in these interactions. Further, we say that V* is an TV-bounded 
concurrent adversary if it assumes at most TV verifier identities during its (un- 
bounded) interactions with pE 

The transcript of a concurrent interaction consists of the common input x, 
followed by the sequence of prover and verifier messages exchanged during the 
interaction. We denote by viewy, (x, z, TV) the random variable describing the 
content of the random tape of the TV-bounded concurrent adversary V* with 
auxiliary input 2 and the transcript of the concurrent interaction between P 
and V* on common input x. 

Definition 1 (Concurrent Zero Knowledge in Bounded Player Model). 

Let ( P , V) be an interactive argument system for a language L. We say that 
{P,V) is concurrent zero-knowledge in the bounded player model if for every 
TV -bounded concurrent non-uniform ppt adversary V*, there exists a ppt algo- 
rithms, such that the following ensembles are computationally indistinguishable, 
{vie\A%*(x, z, N)} xeL ' Ze{0il} , and (5(a;,2,TV)} xe i, iZe{ o,i}». 

As a final note, we remark that following previous work in the BPK model and 
in the BP model, we will consider the notion of concurrent soundness where the 
malicious prover is allowed to play any concurrent number of sessions with the 
same verifier. Indeed, this is notion is strictly stronger than sequential soundness. 


2.3 Building Blocks 

In this section, we discuss the main building blocks that we will use in our cZK 
construction. 


Thus, V 


open multiple 


with P for every unique verifier identity. 
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Statistically binding commitment schemes. In our constructions, we will make 
use of a statistically binding string commitment scheme, denoted Com. For sim- 
plicity of exposition, we will make the simplifying assumption that Com is a 
non-interactive perfectly binding commitment scheme. In reality, Com would 
be taken to be a standard 2-round commitment scheme, e.g. m- Unless stated 
otherwise, we will simply use the notation Comfy) to denote a commitment 
to a string x, and assume that the randomness (used to create the commit- 
ment) is implicit. We will denote by Comfy; r) a commitment to a string x with 
randomness r. 

Witness indistinguishable arguments of knowledge. We will also make use of a 
witness-indistinguishable proof of knowledge (WIPOK) for all of MV in our con- 
struction. Such a scheme can be constructed, for example, by parallel repetition 
of the 3-round Blum’s protocol for Graph Hamiltonicity [4] . We will denote such 
an argument system by (PWi, Vwi). 

The universal argument of fffj. In our construction, we will use the 4-round 
universal argument system (UA), denoted pUA presented in [2] and based on 
the existence of collision-resistant hash functions. We will assume without loss 
of generality that the initial commitment of the PCP sent by the prover in 
the second round also contains a commitment of the statement. We notice that 
such an argument system is still sound when the prover is required to open the 
commitment of the statement in the very last round. 

Signature schemes. We will use a signature scheme (KeyGen, Sign, Verify) 
that is unforgeable against chosen message attacks. Note that such signatures 
schemes are known based on one way functions [32] . 

3 A Constant-Round Protocol 

In this section, we describe our constant-round concurrent zero-knowledge pro- 
tocol in the bounded player model. 

Relation Rsi m . We first recall a slight variant of Barak’s [I] NTIMEfy’fy)) 
relation i? s ; m , as used previously in [2S]- Let T : N — >• N be a “nice” function 
that satisfies T(n) = Let {'H n }n be a family of collision-resistant hash 

functions where a function h £ H n maps {0,1}* to {0,1}", and let Com be 
a perfectly binding commitment scheme for strings of length n, where for any 
a G {0, 1}", the length of Comfy) is upper bounded by 2 n. The relation i? sim 
is described in Figure [21 

Remark 1. The relation presented in Figure [21 is slightly oversimplified and 
will make Barak’s protocol work only when {H n } n is collision-resistant against 
“slightly” super-polynomial sized circuits. For simplicity of exposition, in this 
manuscript, we will work with this assumption. We stress, however, that as dis- 
cussed in prior works \2\26\29\28\18\l . this assumption can be relaxed by using 
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Instance: A triplet (h,c,r) £ H n X {0,1}" X (0, l} poly( " ) . 

Witness: A program 17 £ {0, 1}*, a string y £ {0, 1}* and a string s £ {0, l} poly («) 
Relation: R s \ m ((h, c, r ), (17, y, s )) = 1 if and only if: 

1. \y\ < |r| -n. 

2. c = Com (h(n)-,s). 

3. II(y) = r within T(n) steps. 


Fig. 2. R&m - A variant of Barak’s relation [28] 


a “good” error-correcting code ECC (with constant distance and polynomial- 
time encoding and decoding procedures), and replacing the condition c = 
Com(/i(7Z'); s) with c = Com(ECC(/i(i7)); s). 

Our protocol. We are now ready to present our concurrent zero knowledge proto- 
col, denoted (P, V ) . Let P and V denote the prover and verifier respectively. Let 
N denote the bound on the number of verifiers in the system. In our construction, 
the identity of a verifier V) corresponds to a verification key vki of a secure signa- 
ture scheme and a hash function hi £ P n from a family % n of collision-resistant 
hash functions. Let (KeyGen, Sign, Verify) be a secure signature scheme. Let 
(LWi, Vwi) be a witness-indistinguishable argument of knowledge system. Let 
pUA be the universal argument (UARG) system of [2] that we discussed pre- 
viously; the transcript is composed by four messages (h, (5,^,8) where h is a 
collision-resistant hash function. 

The protocol (P, V) is described in Figure [3] For our purposes, we set the 
length parameter £(N) = N ■ P(n)+n, where P(n) is a polynomial upper bound 
on the total length of the prover messages in the UARG pUA plus the output 
length of a hash function h £ P n . For simplicity we omit some standard checks 
(e.g., the prover needs to check that vk and h are recorded, the prover needs to 
check that the signatures is valid). 

The completeness property of ( P , V) follows immediately from the construc- 
tion. Next, in Section [TUI we prove concurrent soundness of (P,V), i.e., we 
show that a computationally-bounded adversarial prover who engages in multi- 
ple concurrent executions of (P, V) (where the scheduling across the sessions is 
controlled by the adversary) can not prove a false statement in any of the ex- 
ecutions, except with negligible probability. As observed in [18], “stand-alone” 
soundness does not imply concurrent soundness in the bounded player model. 
Informally speaking, this is because the standard approach of reducing concur- 
rent soundness to stand-alone soundness by “internally” emulating all but one 
verifier does not work since the verifier’s keys are private 0 

4 Indeed, Micali and Reyzin [23] gave concrete counter-examples to show that stand- 
alone soundness does not imply concurrent soundness in the bare public key model. 

It is not difficult to see that their results immediately extend to the bounded player 

model. 
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Parameters: Security parameter n, number of players N = N(n), length parameter 
£(N). 

Common Input: x € {0, l}P° ly W. 

Private Input to P: A witness w s.t. Rl(x,w) = j. 

Private Input to V: A key pair ( sk,vk ) <— KeyGen(l"), and a hash function 
h*U n . 

Stage 1 (Preamble Phase): 

V -+ P: Send vk, h. 

P^V: Send c = Com(0"). 

V -> P: Send r £ (0, 1}W and a = Sign 3fc (c]|r). 

P->V: Send o' = Com(0”). 

V — ► P: Send 7 A {0, l} n , and u' = Sign sfc (c'|| 7 ). 

Stage 2 (Proof Phase): 

P +¥ V: An execution of WIPOK (PWi, Vwi) to prove the OR of the following 
statements: 

1. 3w € (0, l}P° ly (l*l) s.t. R l [x,w) = 1. 

2. 3 (c,r,a), and (/3,7, S,c',t,cr') s.t. 

- Verify„ fc (c||r; a) = 1, and 

- c' = Com(/3;t), and Verify^ (c'||-y; a') = 1, and 

- ( h , P , 7, 5) is an accepting transcript for a UARG pUA proving the 
following statement: 3{II,y,s) s.t. R S im {{h, c,r), ( II,y,s )) = 1. 


Fig. 3. Protocol ( P , V) 


We now turn to prove that protocol (P, V) is concurrent zero-knowledge in 
the bounded player model. 

3.1 Proof of Concurrent Zero Knowledge 

In this section, we prove that the protocol ( P , V) described in Section [3] is con- 
current zero-knowledge in the bounded player model. Towards this end, we will 
construct a non-black-box (polynomial-time) simulator and then prove that the 
concurrent adversary’s view output by the simulator is indistinguishable from 
the real view. We start by giving an overview of the proof and then proceed to 
give details. 

Overview. Recall that unlike the bounded concurrency model, the main chal- 
lenge in the bounded player model is that the total number of sessions that a 
concurrent verifier may schedule is not a priori bounded. Thus, one can not di- 
rectly employ Barak’s simulation strategy of committing to a machine that takes 
only a bounded-length input y (smaller than the challenge string r) and outputs 
the next message of the verifier. Towards this end, the crucial observation in [18] 
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is that in the bounded player model, once the simulator is able to “solve” the 
identity of a specific verifier, then it does not need to be perform any more 
“expensive” (Barak-style) non-black-box simulation for that identity. Then, the 
main challenge remaining is to ensure that the expensive non-black-box sim- 
ulations that need to be performed before the simulator can solve a particular 
identity, can be a-priori bounded, regardless of the number of concurrent sessions 
opened by the verifier. Indeed, [THj use a randomized simulation strategy (that 
crucially relies on a super-constant number of rounds) to achieve this effect. 

In our case, we also build on the same set of observations. However, we cru- 
cially follow a different strategy to a-priori bound the number of expensive non- 
black-box simulations that need to performed in order to solve a given identity. 
In particular, unlike [IB], where the “trapdoor” for a given verifier simply corre- 
sponds to its secret key, in our case, the trapdoor consists of a signed statement 
and a corresponding universal argument proof transcript (where the signature 
is computed by the verifier using the signing key corresponding to its identity). 
Further, and more crucially, unlike [18] , where the simulator makes a “disjoint” 
effort in each session corresponding to a verifier to extract the trapdoor, in our 
case, the simulator gradually builds the trapdoor by making “joint” effort across 
the sessions. In fact, our simulator only performs one expensive non-black-box 
simulation per identity; as such, the a-priori bound on the number of identities 
immediately yields us the desired effect. Indeed, this is why we can perform 
concurrent simulation in only a constant number of rounds. 

The Simulator. We now proceed to describe our simulator S. Let N denote the 
a priori bound on the number of verifiers in the system. Then, the simulator S 
interacts with an adversary V* = (Vj*, . . . , Vv) who controls verifiers Vi, ... , Vv- 
V* interacts with S in rn sessions, and controls the scheduling of the messages. 
<S is given non-black-box access to V* . 

The simulator <S consists of two main subroutines, namely, <S eaS y and .Sheavy- As 
the name suggests, the job of d>heavy is to perform the “expensive” non-black-box 
simulation operations, namely, constructing the transcripts of universal argu- 
ments, which yield a trapdoor for every verifier V- On the other hand, S e asy 
computes the actual (simulated) prover messages in both the preamble phase 
and the proof phase, by using the trapdoors. We now give more details. 
Simulator S. Throughout the simulation, S maintains the following three data 
structures, each of which is initialized to _L: 

1. a list 7r = (7Ti, . . . , 7Tjv) • where each rq is either _L or is computed to be 
hi(II). Here, hi is the hash function corresponding to V and IT is the aug- 
mented machine code that is used for non-black-box simulation. We defer 
the description of IT to below. 

2. a list trap heavy = (trap 1 eavy , . . . , trap^ avy ), where each trap, avy corresponds to 
a tuple (hi, c, r, IT, y, s) s.t. Rsim((hi, c, r), (IT, y, s)) = 1. 
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3. a list trap easy = (trap“ sy , . . . , trap“ sy ), where each trap“ sy corresponds to a 
tuple (c,r,a, P,'y,6,c',t,a') s.t. 

— Verify„ fc . (c||r; a) = 1, and 

- d = Com(/3;t), and Verify^, (d ||y; a’) = 1, and 

- (hi, 13,7, 6) is an accepting transcript for a UARG pUA proving the fol- 
lowing statement: 3(77, y,s) s.t. Rs\ m ({hi,c,r},(II,y,s)) = 1. 

Augmented machine 77. The augmented machine code 77 simply consists of the 
code of the adversarial verifier V* and the code of the subroutine <S easy (with a 
sufficiently long random tape hardwired, to compute the prover messages in each 
session) , i.e., 77 = (V*,«S ea sy)- The input y to the machine 77 consists of the lists 
7r and trap easy , i.e., y = (tt, trap easy ). Note that it follows from the description 
that |y| < i(N) - n. 

We now describe the subroutines <S eaS y an<; l «Sheavy, and then proceed to give a 
formal description of <S. For simplicity of exposition, in the discussion below, we 
assume that the verifier sends the first message in the WIPOK (7-Wi, Vwi)- 
Algorithm <S easy (7, msgT, 7r, trap easy ; z). The algorithm <S ea sy prepares the (simu- 
lated) messages of the prover P in the protocol. More specifically, when executed 
with input (i, msgT, 7r, trap easy ; z), S easy does the following: 

1. If msgT is the first verifier message of the preamble phase from V) in a session, 
then <S easy parses 7r as 7Ti, . . . , Try. It computes and outputs c = Com(7r*; z). 

2. If msgT is the second verifier message of the preamble phase from V t in a 
session, then <S easy computes and outputs c = Com(/3: z). where /3 is the 
corresponding (i.e., fourth) entry in trap* asy E trap easy . 

3. If msgj" is a verifier message of the WIPOK from V) in the proof phase of 
a session, then if trap® asy = _L, then <S easy aborts and outputs A, otherwise 
<S eas y simply runs the code of the honest Pwi to compute the response using 
randomness 2 and the trapdoor witness trap® asy . 

Algorithm iSheavy (*, .?• 7i tr-ap heavy ) . The algorithm Ah ea vy simply prepares one 
UARG transcript for every verifier V), which in turn is used as a trapdoor by 
the algorithm <S easy . More concretely, when executed with input ( i , j, 7, trap heavy ), 
Aheavy does the following: 

1. If j = 1, then <Sh eavy parses the 7 th entry trap^ eavy intrap heavy as (hi,c,r,II,y,s). 
It runs the honest prover algorithm Pua and computes the first message /3 of 
a UARG for the statement: 3(77, y, s) s.t. 7? S i m ((/ij, c, r), (77, y, s )) = 1. <Sh eavy 
saves its internal state as state* and outputs #3 0 

2. If j = 2, then <Sh eavy uses state* and 7 to honestly compute the final prover 
message 6 for the UARG with prefix (/i*,/3, 7). It outputs 6. 

Algorithm <S. Given the above subroutines, the simulator S works as follows. 
We assume that every time S updates the lists 7r and trap easy , it also auto- 
matically updates the entry corresponding to y (i.e., the fifth entry) in each 

5 For simplicity of exposition, we describe <Sh ea vy as a stateful algorithm. 
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trapj 63 ^ e trap heavy . For simplicity of exposition, we do not explicitly mention 
this below. 

Preamble phase: 

1. On receiving the first message msgf = ( vki,hi ) from V* on behalf of V in 
the preamble phase of a session, S first checks whether 77 = _L (where 77 is 
the i th entry in the list 7r); if the check succeeds, then S updates n i = hi(II). 
Next, S samples fresh randomness s from its random tape and runs iS easy on 
input (i, msg^, 7T, trap easy ; s). S sends the output string c from S easy to V* . 
Further, S adds (hi, c, •, 77, y, s ) to trapj) eavy and (c, •, -, -, -, -, -, -, •) to trap aasy . 

2. On receiving the second message message msgif = (r, o) from V* on behalf 
of Vi in the preamble phase of a session, S first verifies the validity of the 
signature a w.r.t. vk t . If the check fails, <S considers this session aborted (as 
the prover would do) and ignores any additional message for this session. 
Otherwise, <S checks whether the entries corresponding to r and o (i.e., 2nd 
and 3rd entries) in trap aasy are _L. If the check succeeds, then: 

- S sets r as 3rd entry of trap, heavy and r, a as second and third entries of 
trap aasy . 

- Further, S runs 5heavy on inpuf@ (i, 1, _L, trap heavy ) to compute the message 
/3 of a UARG for the statement: 3(77, y, s) s.t. 77 sim ((/i i , c, r), (77, y, s)) = 1. 
Here hi, c, r, 77, y, s are such that trap^ eavy = (hi, c, r, 77, y, s) . 

- On receiving the output message /3, S sets to fi the fourth slot of trap aasy . 
Next, S samples fresh randomness t and runs <S eaS y on input 
(i, msg^,7r,trap easy ;t). On receiving the output string e! from <S easy , S for- 
wards it to V*. Further, S sets to (d ,t) the 7th and 8th slot of trap) asy . 

3. Finally, on receiving the last message msg f F = (7, a') from V* on behalf 
of Vi in the preamble phase of a session, S first verifies the validity of the 
signature o' w.r.t. vk t . If the check fails, S considers this session aborted 
(as the prover would do) and ignores any additional message for this session. 
Otherwise, S checks whether the entries corresponding to 7 and a' in trap aasy 
are _L. If the check succeeds, then: 

— S sets to 7 and o' the 5th and 9th slot of trap aasy . 

- Further, S runs <Sheavy on input (i, 2, 7, trap heavy ) to compute the final 
prover message <5 of the UARG with prefix (hi,/3, 7), where (/3, 7) are 
the corresponding entries in trap) asy . 

- On receiving the output message 6, S sets to 5 the 6th slot of trap aasy . 

Proof phase: On receiving any message rnsgj from V* on behalf of Vi, S runs 
<S ea sy on input ( i , msgj , -k, trap easy ) and fresh randomness. S forwards the output 
message of S e3 sy to V* . 

This completes the description of S and the subroutines S easy , iSh eav y It follows 
immediately from the above description that S runs in polynomial time and 
outputs -L with probability negligibly close to an honest prover. 


For simplicity of exposition, we assume that randomness is hardwired in <S heavy and do 
not mention it explicitly. 
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We now show through a series of hybrid experiments the simulator’s output is 
computationally indistinguishable from the output of the adversary when inter- 
acting with honest provers. Our hybrid experiments will be 77j for i = 0, . . . , 3. 
We write Hi w Hj if no V* can distinguish (except with negligible probability) 
between its interaction with Hi and Hj . 

Hybrid Ho. Experiment Ho corresponds to the honest prover. That is, in every 
session j € [m], Ho sends c and d as commitments to the all zeros string in 
the preamble phase. We provide H 0 with a witness that x G L which it uses to 
complete the both executions of the WIPOK (PWi, Vwi) played in each session. 

Hybrid Hi. Experiment Hi is similar to Ho, except the following. For every 
i £ [IV], for every session corresponding to verifier V), the commitment c in the 
preamble phase is prepared as a commitment to 77 = hi(n), where hi is the 
hash function in the identity of V) and 17 is the augmented machine code as 
described above. 

The computational hiding property of Com ensures that Hi « Hq. 

Hybrid H 2 . Experiment H 2 is similar to Hi, except the following. For every 
i € [IV] , for every session corresponding to verifier V) , the commitment d in the 
preamble phase is prepared as a commitment to the string ,3 with randomness 
t, where /3 is the first prover message of a UARG computed by <Sheavy- in the 
manner as described above. 

The computational hiding property of Com ensures that H 2 w Hi. 

Hybrid H 3 . Experiment H 3 is similar to H 2 , except the following. For every 
i € [IV], for every session corresponding to verifier V), the WIPOK (TWi, Vwi) in 
the proof phase is executed using the trapdoor witness trap aasy , in the manner 
as described above. Note that this is our simulator S. 

The witness indistinguishability property of (Pwi, Vwi) ensures that H 3 w 7f 2 . 

3.2 Proof of Concurrent Soundness 

Consider the interaction between a cheating P* and an honest V. Suppose that 
P* fools V into accepting a false proof in some session with non-negligible prob- 
ability. We show how to reduce P* to an adversary that breaks the security of 
one of the used ingredients. We will first consider P* as a sequential malicious 
prover. We will discuss the issues deriving from a concurrent attack later. 

First of all, notice that by the proof of knowledge property of the second 
WIPOK, we have that with non-negligible probability, an efficient adversary E 
can simply run as a honest verifier and extract a witness from that WIPOK of ses- 
sion l where the false statement is proved. Since the statement is false, the witness 
extracted will therefore be (c, r, a, /?, 7 , 6, d , t, a') such that Verify„ fc (c||r; cr) = 1, 
d = Com(/3; t), Verify wfc (c , || 7 ; a') = 1, and (h, /3, 7 , 6) is an accepting transcript 
for a UARG pUA proving the statement 3(71, y, s) s.t. Rs\ m ({h, c, r), (77, y, s )) = 
1, and h is the hash function corresponding to the verifier run by E in session l. 
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By the security of the signature scheme, it must be the case that signatures 
a and a' were generated and sent by E during the experiment (the reduction is 
standard and omitted). 

Therefore we have that with non-negligible probability there is a session i 
where h and 7 were played honestly by E, (h, [3, 7 , 5) is an accepting transcript 
for the UARG for 7?sj m ((/i, c, r), (77, y, s)) = 1, and a commitment to (3 was given 
before 7 was sent. Moreover, there is a session j where c and r were played as 
commitment and challenge. Remember that the session l is the one where the 
false statement is proved. 

We can now complete the proof by relying almost verbatim on the same 
analysis of |1I2] . Indeed, by rewinding the prover and changing the challenge r 
in session j, with another random string, we would have an execution identically 
distributed with respect to the previous one. Therefore it will happen with non- 
negligible probability that the prover succeeds in session l, still relying on the 
information obtained in sessions i and j. The analysis of m by relying on 
the weak proof of knowledge property of the UA, shows that this event can be 
reduced to finding a collision that contradicts the collision resistance of h. 

We finally discuss the case of a concurrent adversarial prover. Such an at- 
tack is played by a prover aiming at obtaining from concurrent sessions some 
information to be used in the target session where the false theorem must be 
proved. In previous work in the BPK model and in the BP model this was a 
major problem because the verifier used to give a proof of knowledge of its se- 
cret key, and the malleability of such a proof of knowledge could be exploited 
by the malicious prover. Our protocol however bypasses this attack because our 
verifier does not give a proof of knowledge of the secret key of the signature 
scheme, but only gives signatures of specific messages. Indeed the only point in 
which the above proof of soundness needs to be upgraded is the claim that by 
the security of the signature scheme, it must be the case that signatures a and 
a' where generated and sent by E during the experiment. In case of sequential 
attack, this is true because running the extractor of the WIPOK in session l 
does not impact on other sessions since they were played in full either before 
or after session l. Instead, in case of a concurrent attack, while rewinding the 
adversarial prover, new sessions could be started and more signatures could be 
needed. As a result, it could happen that in such new sessions the prover would 
ask precisely the same signatures that are then extracted from the target session. 
We can conclude that this does not impact on the proof for the following two 
reasons. First, in the proof of soundness it does not matter if those signatures 
appear in the transcript of the attack, or just in the transcript of a rewinded 
execution. Second, the reduction on the security of the signature scheme works 
for any polynomial number of signatures asked to the oracle, therefore still holds 
in case of a concurrent attack. Indeed, the work of E is performed in polynomial 
time even when rewinding a concurrent malicious prover, therefore playing in 
total (i.e., summing sessions in the view of the prover and sessions played during 
rewinds) a polynomial number of sessions, and therefore asking a polynomial 
number of signatures only to the signature oracle. 
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Further details on the proof of soundness. Given a transcript (h, UA1,U A2. UA3) 
for the universal argument of [1], we stress that soundness still works when the 
prover sends the statement to the verifier only at the 4th round, opening a com- 
mitment played in the second round. The proof of concurrent soundness of our 
protocol goes through a reduction to the soundness of the universal argument of 
[2] and goes as follows. 

Let P* a be the adversarial prover that we construct against the universal 
argument of [2] , by making use of the adversary P* of our protocol. Let V ua be 
the honest verifier of the universal argument of [2]. P* a gets “h” from V ua and 
plays it in a random session s of the experiment (it could therefore be played in 
a rewinding thread) with P* . Later on, since by contradiction P* is successful, 
UA messages (UA1,UA2,UA3) are extracted and with noticeable probability 
they correspond to session s. Therefore P* a sends UA1 to V ua and gets back 
UA2' . Then P* a rewinds P* to the precise point where UA2 was played. Now 
P* a plays UA2'. Again, later on, since by contradiction P* is successful, P* a will 
again extract from P* and with noticeable probability (still because the number 
of sessions played in the experiment is polynomial), it will get an accepting 
transcript {U Al, U A2l , U A3*) for the same statement (this is guaranteed by the 
security of the signature scheme and the binding of the commitment). Then P* a 
can send UA3* to V ua therefore proving a false statement. 
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Abstract. Gennaro, Gentry, Parno and Raykova proposed an efficient 
NIZK argument for Circuit-SAT, based on non-standard tools like con- 
scientious and quadratic span programs. We propose a new linear PCP 
for the Circuit-SAT, based on a combination of standard span pro- 
grams (that verify the correctness of every individual gate) and high- 
distance linear error-correcting codes (that check the consistency of wire 
assignments). This allows us to simplify all steps of the argument, which 
results in significantly improved efficiency. We then construct an NIZK 
Circuit-SAT argument based on existing techniques. 

Keywords: Circuit-SAT, linear error-correcting codes, linear PCP, non- 
interactive zero knowledge, polynomial algebra, quadratic span program, 
span program, verifiable computation. 


1 Introduction 

By using non-interactive zero knowledge (NIZK, [3]), the prover can create a 
proof 7 r, s.t. any verifier can later, given access to a common reference string, 
the statement, and 7r, verify the truth of the intended statement without learning 
any side information. Since a single proof might get transferred and verified many 
times, one often requires sublinear communication and verifier’s computation. 
(Unless stated explicitly, we measure the communication in group elements, and 
the computation in group operations.) While succinct NIZK proofs are impor- 
tant in many cryptographic applications, there are only a few different generic 
methodologies to construct them efficiently. 

Groth |16| proposed the first sublinear-communication NIZK argument 
(computationally-sound proof, 0]) for an NP-complete language. His construc- 
tion was improved by Lipmaa [Hi] . Their Circuit-SAT argument consists of 
efficient arguments for more primitive tasks like Hadamard sum, Hadamard 
product and permutation. The Circuit-SAT arguments of |16ll9j have con- 
stant communication, quadratic prover’s computation, and linear verifier’s com- 
putation in s (the circuit size). In [TB] , the CRS length is <9(s 2 ), and in jT5] . it is 
0(rg' 1 (s)) = o(s2 2 V 2 log 2 s ), w here rs(N) = Q(N log 1 / 4 N/ 2 2 V 21og 2 -W) [§] j s the 
cardinality of the largest progression- free subset of [N] . Because of the quadratic 
prover’s computation, the arguments of Groth and Lipmaa are not applicable in 
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practice, unless s is really small. Very recently, Fauzi, Lipmaa and Zhang [TU] con- 
structed arguments for NP-complete languages Set Partition, Subset Sum and 
Decision Knapsack with the CRS length 0(r^ 1 (s)) and prover’s computation 
<9(r^ 1 (s) log s ). They did not propose a similar argument for the Circuit-SAT. 

Gennaro, Gentry, Parno and Raykova [T5] constructed a Circuit-SAT NIZK 
argument based on efficient (quadratic) span programs. Their argument con- 
sists of two steps. The first step is an information-theoretic reduction from the 
Circuit-SAT to QSP-SAT [2], the satisfaction problem of quadratic span pro- 
grams (QSPs, [IS]). The second step consists of cryptographic tools that allow 
one to succinctly verify the satisfiability of a QSP. 

Intuitively, a span program consists of vectors Ui for i > 0, a target vector Wo- 
und a labelling of every vector m by a literal x b = x\ or x L = x ( b or by JL. A span 
program accepts an input w iff Uq belongs to the span of the vectors Ui that 
are labelled by literals x™ L (or by _L) that are consistent with the assignment 
w = {w c ) to the input x = {x L ). I.e., u 0 = o a i u i> where o* ^ 0 if the 
labelling of m is not consistent with w. (See Sect. GO for more background.) 

Briefly, the first step constructs span programs (which satisfy a non-standard 
conscientiousness property) that verify the correct evaluation of every individual 
gate. Conscientiousness means that the span program accepts only if all inputs 
to the span program were actually used (in the case of Circuit-SAT, this means 
that the prover has set some value to every input and output wire of the gate, 
and that exactly the same value can be uniquely extracted from the argument) . 
The gate checkers are aggregated to obtain a single large conscientious span 
program that verifies the operation of every individual gate in parallel. They then 
construct a weak wire checker that verifies consistency, i.e., that all individual 
gate checkers work on an unequivocally defined set of wire values. The weak wire 
checker of ns] guarantees consistency only if all gate checkers are conscientious. 
They define quadratic span programs (QSPs, see [T5]) and construct a QSP that 
implements both the aggregate gate checker and the weak wire checker. 

In the second step, Gennaro et al. construct a non-adaptively sound NIZK 
argument that verifies the QSP, with a linear CRS length, 0(slog 2 s) prover’s 
computation, and linear-in-input size verifier’s computation. It can be made 
adaptively sound by using universal circuits [2S] , see jT5] for more information. 

The construction of US is quite monolithic and while containing many new 
ideas, they are not sufficiently clarified in m ■ Bitansky et al [2] simplified the 
second step of the construction from [IS], by first constructing a linear PCP [2], 
then a linear interactive proof, and finally a NIZK argument for Circuit-SAT. 
Their more modular approach makes the ideas behind the second step more 
accessible. Unfortunately, [2] is slightly less efficient than [IS] , and uses a (pre- 
sumably) stronger security assumption. 

We improve the construction of US] in several aspects. Some improvements are 
conceptual (e.g., we provide cleaner definitions, that allow us to offer more ef- 
ficient constructions) and some of the improvements are technical (with special 
emphasis on concrete efficiency). More precisely, we modularize — thus making 
its ideas more clear and accessible — the first step of US to construct a succinct 


Succinct NIZK Arguments from Span Programs and Linear ECCs 


43 


non-adaptive 3-query linear PCP [2] for ClRCUlT-SAT. Then we use the tech- 
niques of [2], together with several new techniques, to modularize the second step 
of [15] . Importantly and contrarily to [2] , by doing so we both improve on the effi- 
ciency of both steps and relax the security assumptions. We outline our construc- 
tion below, and sketch the differences compared to m- 

The main body of the current work consists of a cleaner and more efficient 
reduction from Circuit-SAT to QSP-SAT (another NP-complete language, 
defined later). Given a circuit C , we construct an efficient circuit checker, a QSP 
that is satisfiable iff C is satisfiable. 

To verify whether circuit C accepts an input, we use a small standard (i.e., 
not necessarily conscientious) span program to verify an individual gate. For 
example, a NAND checker is a span program that accepts if the gate implements 
NAND correctly. We construct efficient span programs for gate checkers, needed 
for the Circuit-SAT argument. E.g., we construct a size 6 and dimension 3 
NAND checker; this can be compared to size 12 and dimension 9 conscientious 
NAND checker from US- By using the AND composition of span programs, we 
construct a single large span program that verifies every gate in parallel. 

Unfortunately, simple AND composition of the gate checkers is not secure, 
because it allows “double-assignments” . More precisely, some vectors of several 
adjacent gate checkers are labelled by the variable corresponding to the same 
wire. While every individual checker might be locally correct, one checker could 
work with value 0 while another checker could work with value 1 assigned to the 
same wire. Clearly, such bad cases should be detected. More precisely, it must 
be possible to verify efficiently that the coefficients a* that were used in the gate 
checkers adjacent to some wire are consistent with a unique wire value. 

We solve this issue as follows. Let Code be an efficient high-distance linear 
[N,K,D\ error-correcting code with D > N/2. For any wire rj, consider all 
vectors from adjacent gate checkers that correspond to the claimed value x v 
of this wire. Some of those vectors (say m) are labelled by the positive literal 
x v and some (say Vi) by the negative literal x v . The individual gate checker’s 
acceptance “fixes” certain coefficients aj (that are used with m) and bi (that are 
used with Vi ) for all adjacent gate checkers. Roughly stating, for consistency of 
wire rj one requires that either all aj are zero (then unequivocally x v =0), or 
all bi are zero (then unequivocally x v = 1). We verify that this is the case by 
applying Code separately to the vectors a and b. The high-distance property of 
Code guarantees that if a and b are not consistent, then there exists a coefficient 
i, s.t. Code(a)j ■ Code(6)j 7^ 0. 

Motivated by this construction, we redefine QSPs [IS] as follows. Let o denote 
the pointwise product of two vectors. A QSP (that consists of two target vectors 
uo = (uo j) G F d and v 0 = (voj) G F d and two m x d matrices U = ( Uij ) and 
V = ( Vij ) for i G [to] and j G [d]) over some field F accepts an input iff for some 
vectors a and b, consistent with this input, 

(a T • U- u 0 ) o (b T ■ V - v 0 ) = 0 . (1) 

Clearly, Eq. m is equivalent to the requirement that for all j G [cZ] , 
(J0”L 1 a i u ii ~ u oj ) ‘ (EiLi bi y ij — Vo j) = 0. Since F is an integral domain, the 
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latter holds iff for all j e [d], either YliLi a i u ij = uoj or Y2iLih v ij = v oj > 
which can be seen as an element-wise OR of two span programs. This can be 
compared to the element-wise AND of two span programs that accepts iff for 
all j e [d], both ciiUij = uoj and & * v ij = v oj iff two span programs 
accept simultaneously, i.e., Y) a i u i = u o and Y)h v i = v 0 - On the other hand, 
it is not known how to implement an element-wise OR composition of two span 
programs as a small span program. QSPs add an element-wise OR to an element- 
wise AND, and thus it is not surprising that they increase the expressiveness of 
span programs significantly. 

The above linear error-correcting code based construction implements a QSP 
(a wire checker), with U and V being related to the generating matrices of the 
code. (See Def. (U) Basically, the wire checker verifies the consistency of vectors 
a and h with the input. 

We use the systematic Reed-Solomon code, since it is a maximum distance 
separable code with optimal support (i.e., it has the minimal possible number 
of non-zero elements in its generating matrix). It also results in the smallest 
degree of certain polynomials in the full NIZK argument. While no connection 
to error-correcting codes was made in m , their wire checker can be seen as 
a suboptimal (overdefined) variant of the systematic Reed-Solomon code. Due 
to the better theoretical foundation, the new wire checker is more efficient, and 
optimal in its size and support. Moreover, one can use any efficient high-distance 
( D > N/2) linear error-correcting code, e.g., a near-MDS code [7], Whether this 
would result in any improvement in the computational complexity of the final 
NIZK argument is an interesting open question. 

Moreover, the wire checker of m is consistent (and thus their NIZK argument 
is sound) only if the gate checkers are conscientious. The new wire checker does 
not have this requirement. This not only enables one to use more efficient gate 
checkers but also potentially enables one to use known techniques (combinatorial 
characterization of span program size m semidefinite programming |24| ) to 
construct more efficient checkers for iarger unit computations. 

We construct an aggregate wire checker by appiying an AND composition 
to wire checkers, and then construct a single QSP (the circuit checker) that 
implements both the aggregate gate checker and the aggregate wire checker. At 
this point, the approach of the current paper pays off also conceptually: one can 
compare the description of the circuit checker (called a canonical QSP) in [15l 
Sect. 2.4], that takes about 3/4 of a page, with the description from the current 
paper (Def. 0) that takes only a couple of lines. 

We prove that the circuit checker (the QSP) is satisfiable iff the original circuit 
is satisfiable. Since the efficiency of the new circuit checker depends on the fan- 
out of the circuit, we use the classical result from m about constructing low fan- 
out circuits that allows us to optimize the worst case size and other parameters, 
especially support, of the circuit checker. 

To summarize, the new circuit checker consists of two elements. First, an ag- 
gregate gate checker (a span program) that verifies that every individual gate 
is executed correctly on their local variables. Second, an aggregate wire checker 
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(a QSP, based on a high-distance linear error-correcting code) that verifies that 
individual gates are executed on the consistent assignments to the variables. Im- 
portantly (for the computational complexity of the NIZK argument) , the circuit 
checker is a composition of small (quadratic) span programs, and has only a 
constant number of non-zero elements per vector. 

This finishes the description of the Circuit-SAT to QSP-SAT reduction. 
To construct an efficient NIZK argument for Circuit-SAT, we need several ex- 
tra steps. Based on the new circuit checker, we first construct a non- adaptive 

2- query linear PCP([2], see Sect. [5] for a definition) for ClRCUlT-SAT with linear 
communication. This seems to be the first known non-trivial 2-query linear PCP. 
Moreover, we use a more elaborate extraction technique which, differently from 
the one from qe also works with non-conscientious gate checkers. This improves 
the efficiency of the linear PCP. In particular, the computation of the decision 
functionality of the linear PCP is dominated by a small constant number of field 
operations. The same functionality required 0(n) operations in |15I2| . Interest- 
ingly, this construction by itself is purely linear- algebraic, by using concepts like 
span programs, linear error-correcting codes, and linear PCPs. 

To improve the communication of the linear PCP, as in HE we define poly- 
nomial span programs and polynomial QSPs. Differently from EM (that only 
gave the polynomial definition), our main definition of QSPs — as sketched 
above — is linear-algebraic, and we then use a transformation to get a QSP to a 
“polynomial” form. We feel the linear- algebraic definition is much more natural, 
and describes the essence of QSPs better. Based on the polynomial redefinition 
of QSPs and the Schwartz-Zippel lemma, we construct a succinct non-adaptive 

3- query linear PCP for ClRCUlT-SAT. The prover’s computation in this linear 
PCP is 0(s log s), where s is the size of the circuit, and the verifier’s computation 
is again 0(1). In [IS], the corresponding parameters were 0(s log 2 s ) and 0(n). 
Thus, the new 3-query linear PCP is more efficient and conceptually simpler 
than the previously known 3-query linear PCPs [2j. 

By using techniques of [2], we convert the linear PCP to a succinct non- 
adaptive linear interactive proof, and then to a succinct non-adaptive NIZK 
argument. (See the full version, [2D].) As in the case of the argument from [ID] , 
the latter can be made adaptive by using universal circuits [2S] . 

Since the reduction from linear PCP to NIZK from [2] loses some efficiency 
and relies on a stronger security assumption than stated in [15] , we also describe 
a direct NIZK argument with a (relatively complex) soundness proof that fol- 
lows the outline of the soundness proof from EM- The main difference in the 
proof is that we rephrase certain proof techniques from m in the language of 
multilinear universal hash functions. This might be an interesting contribution 
by itself. Apart from a more clear proof, this results in a slightly weaker security 
assumption. (See the full version [20] of this paper.) 

The new non-adaptive Circuit-SAT argument has CRS length 0(s), prover’s 
computation 0(s log s), verifier’s computation 0(1), and communication 0(1). 
In all cases, the efficiency has been improved as compared to the (QSP-based) 
argument from HE Moreover, all additional optimization techniques applicable 
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to the argument from B2 (e.g., the use of collision-resistant hash functions) are 
also applicable to the new argument. 

We hope that by using our techniques, one can construct efficient NIZK argu- 
ments for other languages, like the techniques of JTH] were used in [5] to construct 
an efficient range argument, and in [2T] to construct an efficient shuffle. QSPs 
have more applications than just in the NfZK construction. We only mention 
that one can construct a related zap [8], and a related (public or designated- 
verifier) succinct non-interactive argument of knowledge (SNARK, see [2216] ') by 
using the techniques of Bui- 
lt is also natural to apply our techniques to verifiable computation m : in- 
stead of gates, one can talk about small (but possibly much larger) compu- 
tational units, and instead of wires, about the values transferred between the 
computational units. Since here one potentially deals with much larger span 
programs than in the case of the Circuit-SAT argument, the use of standard 
(non-conscientious) span programs is especially beneficial. Since in the case of 
verifiable computation, the computed function F (and thus also the circuit C) is 
known while generating the CRS, one can use the non- adaptively sound version 
of the new argument [23j . 

Gennaro et al. m also proposed a NfZK argument that is based on quadratic 
arithmetic programs (QAP-s), a novel computational model for arithmetic cir- 
cuits. QAP-based arguments are often significantly more efficient than QSP-based 
arguments, see [15123] . We can use our techniques to improve on QAP-based argu- 
ments, but here the improvements are less significant and thus we have omitted full 
discussion. (See the full version.) Briefly, differently from m , we give an (again, 
more clean) linear- algebraic definition of QAP-s. This enables us to present a short 
alternative proof of the result from m that any arithmetic circuit with n inputs 
and s multiplication gates can be computed by a Q AP of size n + s and dimension 
s. We remark that the QAP-based construction results in a 4-query linear PCP, 
while the QSP-based construction from the current paper results in a 3-query 
linear PCP. 

Due to the lack of space, many proofs are given only in the full version [20] . 


2 Preliminaries: Circuits and Circuit-SAT 

For a fixed circuit C , let s = |Cj be its size (the number of gates), s e its number 
of wires, and n be its input size. Every gate i computes some unary or binary 
function f L : {0, 1}— 2 — >{0,1}. We denote the set of gates of C by [s] and the set 
of wires of C by [s e ]. Assume that the first n wires, rj £ [n], start from n input 
gates l £ [n] . Every wire r] £ [s e ] corresponds to a formal variable x v in a natural 
way. This variable obtains an assignment w v , r] £ [s e ], computed by C from the 
input assignment (w;)^. Denote w := (w v )^ =1 . We write C(w) := C((tUi)™ =1 ). 
For a gate i of C, let deg + (i.) be its fan-out, and let deg - (t) be its fan-in. Let 
deg(i) = deg - (t) + deg + (t). 

Let poly(a:) := x°^K Let 7Z = {( C,w )} be an efficiently computable binary 
relation with |tu| = poly(|Cj) and s := \C\ = poly(|in|). Here, C is a statement, 
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and w is a witness. Let £ = {C : 3 w, ( C , w) £ TZ} be the related NP-language. 
For fixed s, we have a relation 7Z S and a language £ s . 

The language Circuit-SAT consists of all (strings representing) circuits that 
produce a single bit of output and that have a satisfying assignment. That is, 
a string representing a circuit C is in Circuit-SAT if there exists w £ {0, l} Se 
such that C(w) = 1. 

As before, we assume that s = \C\ is the number of gates, not the bitlength 
needed to represent C. Thus, C s = {C : |Cj = s A (3iu £ (0, = 1)} 

and 7Z S = {( C , w) : \C\ = s A w £ {0, l} Se A C(w ) = 1}. 

Let G = (V, E) be the hypergraph of the circuit C. The vertices of G corre- 
spond to the gates of C. A hyperedge rj connects the input gate of some wire 
to (potentially many) output gates of the same wire. In C, an edge rj (except 
input edges, that have <p adjacent vertices) has <p + l adjacent vertices, where <p 
is the fan-out of fys designated input gate. Every vertex of G can only be the 
starting gate of one hyperedge and the final gate of two hyperedges (since we 
only consider unary and binary gate operations). Thus, |P(G)| < 2(\V(G)\ —n). 

3 Preliminaries: Span Programs 

Let F = Ti q be a finite field of size q 2, where q is a prime. However, most 
of the results can be generalized to arbitrary fields. By default, vectors like u 
denote row vectors. For matrix U, let Ui be its ith row vector. For an m X d 
matrix U over F, let span(C/) := a i u i '■ ° € F m }. Let x L , i £ [n], be formal 

variables. Denote the positive literals x L by x] and the negative literals x L by x}. 

A span program [T5] P = (uq, U, q) over a field F is a linear- algebraic compu- 
tation model. It consists of a non-zero target vector Uq £ F d , an m X d matrix 
U over F, and a labelling g : [m\ — > {x,_, x L : i £ [n]} U {T} of C/’s rows by 
one of 2 n literals or by _L. Let U w be the submatrix of U consisting of those 
rows whose labels are satisfied by the assignment w £ {0, 1}”, that is, belong 
to {a:^ 1 : l £ [n]} U {_L}. P computes a function /, if for all w £ (0, 1}": 
u 0 6 span(C/ TO ) if and only if f{w) = 1. 

Let p" 1 = {i £ [m] : g(i ) £ { x : i £ [n]} U {T}} be the set of rows 
whose labels are satisfied by the assignment w. The size, size(P), of P is m. The 
dimension, sdim(P), is equal to d. P has support supp(P), if all vectors u £ U 
have altogether supp(P) non-zero elements. Clearly, Uq can be replaced by an 
arbitrary non-zero vector; one obtains the corresponding new span program (of 
the same size and dimension, but possibly different support) by applying a basis 
change matrix. Let D(x L ) := maxj e { 0 ,l} l£ l_1 ( a; t)l> f° r eac h L € [ n ] and j £ {0, 1}, 
be the maximum number of vectors that have the same label (t, j ) ; this parameter 
is needed when we construct wire checkers. 

Complex span programs are constructed by using simple span programs and 
their composition rules. The Boolean function NAND A is defined as A(x,y) = 
xAy = ->(x A y). Span programs for AND, NAND, OR, XOR, and equality 
of two variables x and y are as in Fig. [TJ Given span programs Po = SP(fo) 
an Pi = SP(fi) for functions /o and /i, one uses well-known AND and OR 
compositions to construct span programs for /o A fi and /o V /i . 
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Fig. 1. From left to right: standard span programs SP( A), SP(A). SP(V), SP((B), 
SP (= ) and new span programs SP(ca) and SP(c Y ) 


A span program (m 0 , U, g) is conscientious na if a linear combination associ- 
ated to a satisfying assignment must use at least one vector associated to either 
x L or x L for every i G [n]. Clearly, SP( A), SP(0) and SP(= ) are conscientious, 
while SP(V) is not. 

4 Efficient Gate Checkers 

A gate checker for a gate that implements / : {0,1}" — > {0,1} is a function 
Cf : {0, 1}" +1 — >• {0,1}, s.t. Cf(x,y) = 1 iff f(x) = y. The NAND-checker 
Ca : {0, l} 3 — > {0, 1} outputs 1 iff z = xAy. 

Lemma 1 . SP(ca) on Fig. Q] is a span program for ca- It has size 6, dimension 
3, and support 7. 

As seen from the proof , given an accepting assignment ( x,y,z ), one can 
efficiently find small values a* 6 [—2, 1] such that JA>i a i u i = «o- How- 
ever, a satisfying input to S ' P ( ca ) does not fix the values a, unequivocally: if 
(x, y, z) = (0, 0, 1) (that is, ai = 02 = ae = 0), then one can choose an arbitrary 
04 and set 05 <— 1 — 04. Since one can set (14 = 0 , S ' P ( ca ) is not conscientious. 

Given S'P(ca), one can construct a size 6 and dimension 3 span program for 
the AND-checker c A (x, y , z) := (x A y) ® z by interchanging in S'P(ca) the rows 
labelled by 2 and z. Similarly, one can construct a size 6 and dimension 3 span 
program for the OR-checker C\y(x, y, z) := {xAy)®z by interchanging in S'P(ca) 
the rows labelled by x and x, and the rows labelled by y and y. NOT-checker 
[x 7^ y\ = x ® y is just the XOR function, and thus one can construct a size 4 
and dimension 2 span program for the NOT-checker function. 

We need the dummy gates y «— x, and corresponding dummy checkers 
c=(x,y) = [x = y\. Clearly, the dummy checker function is just to the equal- 
ity test, and thus has a conscientious span program of size 4 and dimension 2. 
Moreover, if x = y e {0, 1}, then a\ = a 2 = x, while a.3 = (14 = 1 — x. 

We need the fork-checker c Y (x, y\ , y-f) for the fork gate that computes y-\ x, 
?/2 <— x. In the CNF form, c y (x, yi, 2/2) = (SV2/2) A(a;Vyi) A(2 /i Vjte)- Since every 
literal is mentioned once in the CNF, we can use AND and OR compositions to 
derive the span program on Fig. [TJ It has size 6, dimension 3, and support 6. 
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We also need a l-to-0 fork-checker that has 1 input x and 0 outputs y L , with 
y L <— x. The 0- fork checker is Cy(x, y) = (x A y\ A • • • A y </,) V {x A y\ A • • • A y ^). 
Clearly, c$ has CNF c$(x, y) = (x V f/i) A (yi V y 2 ) A • • • A (y^_ 1 V £/,>) A (y^ V z). 
From this we construct a span program exactly as in the case 0 = 2, with size 
2(0 + 1) and dimension 0 + 1. It has only one vector labelled with every xjy 
or its negation, thus D(x) = D{y,) = 1 for all t. To compute the support, note 
that SP*(cy) has two 1-entries in every column, and one in every row. Thus, 

supp (SP(4)) = Ef=i X 2 = 20 + 2. 


5 Aggregate Gate Checker 

Given a circuit that consists of NAND, AND, OR, XOR, and NOT gates, we 
combine the individual gate checkers by using the AND composition rule. In 
addition, for the wire checker of Sect. l6.2l (and thus also the final NIZK argument) 
to be more efficient, all gates of the circuit C need to have a small fan-out. In [IS], 
the authors designed a circuit of size 3-1(71 that implements the functionality 
of C but only has fan-out 2 except for a specially introduced dummy input. 
Their aggregate gate checker (AGC) has size 36 • \C\ and dimension 27 • \C\. 
By using the techniques of [TT] (that replaces every high fan-out gate with an 
inverse binary tree of fork gates, and then gives a more precise upper bound of 
the resulting circuit size), we prove a more precise result. We do not introduce 
the dummy input but we still add a dummy gate for every input. We then say 
that we deal with a circuit with dummy gates. 

Since we are interested in circuit satisfiability, the X-checker (where say X = 
NAND) of the circuit’s output gate simplifies to the X gate (e.g., NAND checker 
simplifies to NAND). Since X has a more efficient span program than X checker, 
then for the sake of simplicity, we will not mention this any more. 

Let C be a circuit. The AGC function age of a circuit C is a function age : 
{0, l}X:i=i degW {o, 1}I C ’I. I If c L is the gate checker of the Ah gate and x L has 
dimension deg(i), then agc(sci, . . . ,x\ C \) = (ci(xi), . . . ,C| C |(®|c|))- 

As in [IS] , we construct the AGC by AND-composition of the gate checkers of 
the individual gate checkers. Since for an individual gate checker and a satisfying 
assignment, one can compute the corresponding coefficient vector a in constant 
time, the aggregate coefficient vector a can be computed from w in time O(s). 
Let a c2q(w) be the corresponding algorithm. 

Theorem 1. Let / : {0, 1}" — >-{0,1} be the function computed by a fan-in < 2 
circuit C with s = \C\ NAND, AND, OR, XOR, and NOT gates. There exists 
a fan-in < 2 and fan-out < 0 circuit with dummy gates Cbnd for f , that has 
the same s gates as C, n additional dummy gates, and up to ( s — 2 n) / (0 — 1) 
additional (p-fork gates. Let 0* := 1/(0— 1). The AGC agc(Cbnd) has a span 
program P with size(P) < (8 + 40*)s — (6 + 80*)n, sdim(P) < (4 + 20*)s — 
(3 + 40*) n, and supp(P) < (9 + 40*) s — (5 + 80*) n. If 0 = 3, then size(P) < 
10s — lOn, sdim(P) < 5s — 5n, and supp(P) < 11s — 9n. 
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The upper bounds of this theorem are worst-case, and often imprecise. The 
optimal choice of <f> depends on the parameter that we are going to optimize. 
The AGC has optimal size, dimension and support if <f> is large (preferably even 
if the fan-out bounding procedure of Thm. Q] is not applied at all) . The support 
of the aggregate wire checker (see Sect. 16.31) is minimized if <f> = 2. To balance 
the parameters, we concentrate on the case <j> = 3. 

6 Quadratic Span Programs and Wire Checker 

6.1 Quadratic Span Programs 

An intuitive definition of quadratic span programs (QSPs) was given in the in- 
troduction and will not be repeated here. We now give a formal (linear-algebraic) 
definition of QSPs. In Sect. EH we will provide an equivalent polynomial redefi- 
nition of QSPs that is the same as the definition given in [13- 

Definition 1. A quadratic span program (QSP) Q = (u 0 , v 0 - U, V, g) over a 
field F consists of two target vectors u o,vo G F d , two to x d matrices U and V, 
and a common labelling g : [to] — »• {x u ,x L : t G [n]}U{_L} of the rows ofU and V. 
Q accepts an input w G {0, 1}" iff there exist (a, b ) G F m x F m , with a, = 0 = 
for all i 0 p" 1 , such that (a T • V — uq) o ( b T ■ W — u 0 ) = 0, where c coy denotes 
the pointwise (Hadamard) product of x and y. Q computes a function f if for 
all w G {0, 1}": f{w) = 1 iff Q accepts w. 

We remark that one can have uq = vq = 0. (See Def. [U for example.) 

The size, size(Q), of Q is to. The dimension, sdim(Q), of Q is d. The support, 
supp(Q), of Q is equal to the sum of the supports (that is, the number of non- 
zero elements) of all vectors Ui and Uj. Clearly, one can compose QSPs by using 
the AND and OR composition rules of span programs, though one has to take 
care to apply the same transformation to both U and V simultaneously. 

The language QSP-SAT consists of all (strings representing) QSPs that pro- 
duce a single bit of output and that have a satisfying assignment. I.e., a string 
representing an n-input QSP Q is in QSP-SAT if there exists w G {0, 1}", such 
that Q(w) = 1. The witness of this fact is (a, b), and we write Q(a, b) = Q(w). 

6.2 Wire Checker 

Gate checkers verify that every individual gate is followed correctly, i.e., that its 
output wire obtains a value which is consistent with its input wires. One also 
requires inter-gate (wire) consistency that ensures that adjacent gate checkers do 
not make double assignments to any of the wires. Here, we consider hyperwires 
that have one input gate and potentially many output gates. Following [IS], for 
this purpose we construct a wire checker. We first construct a wire checker for 
every single wire (that verifies that the variables involved in the span programs 
of the vertices that are adjacent to this concrete wire do not get inconsistent 
assignments), and then aggregate them by using an AND composition. 
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For a (hyper)wire r 7 , let N(rj) be the set of rys adjacent gates. For gate 1 E 
N(r]), let r\ = (u^ , U^\ gW) be its gate checker. For every l e N(rj), one of the 
input or output variables of P t (that we denote by x r . v ) corresponds to x r] . Recall 
that for a local variable y of a span program P L , D(y) = max(|p - 1 (y)|, |p _ 1 (y)|). 
We assume |p - 1 (y)| = |p _ 1 (j/)|, by adding zero vectors to the span programs if 
necessary. Let D(r]) := tj) D fatiy) be the number of the times the rows of 

adjacent gate checkers have been labelled by a local copy of x *. 

We define the 77 th wire checker between the rows of adjacent gates i E N(jj) 
in the AGC that are labelled either by the local variable x i, n or its negation Xi- ri . 
i.e., between 2D(rj) rows {i :3k E N(rj) s.t. gW(i) = Xk :r) V g^(i) = Xk-r,}- Let 
ijj be the natural labelling of the wire checkers, with ip(i) = xt iff gwfi) = x k 
for some k E N(r]). 

Example 1. Consider a (hyper) wire 77 that has one input gate and two output 
gates t '2 and 1 . 3 . Assume that all three gates implement NAND, and thus they 
have gate checkers SP(c a) from Fig. [TJ Assume that x v = z L1 = x 02 = y l3 . Thus, 
the 77 th wire checker is defined between the rows 3 and 6 of the checker for l\ . 
rows 1 and 4 of the checker for 1 , 2 , and rows 2 and 5 of the checker for 73 . Thus, 
D{rj) = D(z Ul ) + D{x L 2 ) + D{y b3 ) = 6 . □ 

We first define the wire checker for a wire y and thus for one variable x v . In 
Sect. 16.31 we will give a definition and a construction in the aggregate case. 

For y = (?/i,..., 7 / 2 d) T , let y (1) := (y 1 ,...,y D ) T and y (2) := 

(y_D + i, . . . ,?/ 2 £)) t - Fix a wire 77 . Assume that D = D( 77 ). Let Q = 
(ito, v 0 , U, V, ip), with m x d matrices U and V. be a QSP. Q is a wire checker, 
if for any a,b E F 2D , Eq. (jTJ) holds iff a and b are consistent bit assignments in 
the following sense: for both k E {1,2}, either a fk ' 1 = 0 or b^' 1 = 0. 

We propose a new wire checker that is based on the properties of high-distance 
linear error-correcting codes, see the introduction for some intuition. To obtain 
optimal efficiency, we choose particular codes (namely, systematic Reed-Solomon 
codes). 


Definition 2. Let D* := 2D — 1. Let RSd be the D x D* generator matrix 
of the [D*,D,D\ q systematic Reed-Solomon code. Let m = 2D and d = 2D*. 


“ u ' -• u ° = US. °x) - y ^ = (°x o*S) • 

(0,0, U, V, tp), where ip 1 (x ri ) = [1, D] and ip 1 {x v ) — [D + 1, 2D], 


We informally define the degree sdeg(Q) of a (quadratic) span program Q as 
the degree of the interpolating polynomial that obtains the value uij at point j. 
See Sect. |9] for a formal definition. 


Lemma 2. Q wc is a wire checker of size 2D, degree D+D* = 3D — 1, dimension 
2D* = 4 D — 2, and support AD 2 . 


Proof. The claim about the parameters follows straightforwardly from the prop- 
erties of the code. It is easy to see that if a and b are consistent bit as- 
signments, then Q wc accepts. For example, if a/ 1 ) = b^ = 0, then clearly 
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(a T • U)j = aiUij = 0 for j f [1,2}*] and ( b T ■ V)j = Y^JiLi h v ij = 0 for 
j £ [ D * + 1, 2D*]. Thus, (a T • U)j • (b T • V)j = 0 for j £ [1,225*], and thus 
(a T • U - 0) o (6 t • V - 0) = 0. 

Now, assume that a and b are inconsistent bit assignments, i.e., ^ 0 and 

fo( fc ) / 0 for k £ {1,2}. W.l.o.g., let k = 1. Since RSd is the generator ma- 
trix of the systematic Reed-Solomon code, the vectors a T • RSd and b 1 • RSd 
have at least D > D* / 2 non-zero coefficients among its first D* coefficients. 
Thus, both a i u ij an< l bi v ij are non-zero for more than D* / 2 dif- 
ferent values j £ [D*]. Hence, there exists a coefficient j £ [£)*], such that 
a i u ij)(J2iL i *»%) 7^ 0- Thus, Q wc does not accept. □ 

We chose a Reed-Solomon code since it is a maximum distance separable (MDS) 
code and thus minimizes the number of columns in RSd ■ It also naturally mini- 
mizes the degree of the wire checker. Moreover, RSd has D 2 non-zero elements. 
Clearly (and this is the reason we use a systematic code), D 2 is also the smallest 
support a generator matrix G of an [n = 2 D — 1 , k = D,d = D] q code can 
have, since every row of G is a codeword and thus must have at least d non-zero 
entries. Thus, G must have at least dD > D 2 non-zero entries, where the last 
inequality is due to the singleton bound. 

The (weak) wire checker of [T3] , while described by using a completely different 
terminology, can be seen as implementing an overdefined version (with D* = 
3 D — 2) of the construction from Def. [5] The linear-algebraic reinterpretation of 
QSPs together with the introducing of coding-theoretic terminology allowed us 
to better exposit the essence of wire checkers. It also allowed us to improve on 
the efficiency, and prove the optimality of the new construction. 

A wire checker with U = V = RSd satisfies the even stronger security re- 
quirement that Eq. © holds iff either a = 0 or b = 0. One may hope to pair 
up literals corresponding to x v in the U part and literals corresponding to x v in 
the V part. This is impossible in our application: when we aggregate the wire 
checkers, we must use vectors labelled with both negative and positive literals in 
the same part, U or V, and we cannot pair up columns from U and V that have 
different indices. (See Def. |31) The construction of Def. |5] allows one to do it, 
though one has to use V that is a dual of U according to the following definition. 

For a labelling ip, we define the dual labelling V’duab such that ip d ua i(*) = iff 
ip(i) = . Let V = f7d U ai be the same matrix as U, except that it has rows from 

^ _1 (5^) and ip^ 1 (x rl ) switched, for every r/. To simplify the notation, we will not 
mention the dual labelling tp dual unless absolutely necessary, and we will assume 
implicitly that (as it was in Def. always V = Ud ua \ . Now, US constructed a 
weak wire checker that guarantees consistency if all individual gate checkers are 
conscientious. The new wire checker is both more efficient and more secure. 

6.3 Aggregate Wire Checker 

Let Q = (0, 0, U, V, ip), with two m x d matrices U and V = Lduai, be a QSP. Q is 
an aggregate wire checker (AWG) for circuit C, if Eq. (pj holds iff a, b £ F m are 
consistent bit assignments in the following sense: for each g £ [s e ] and k £ {0, 1}, 
either = 0 for all i £ ip~ 1 (x!p l ) or bi = 0 for all i £ ip~ 1 (x!p l ). 
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We construct the AWC by AND-composing wire checkers for the individual 
wires. The AWC first resets all vectors Ui and iq to 0, and precomputes RSd v 
for all relevant values D rj < 2 (0 + 1). After that, for every wire rj, it sets the 
entries in rows, labelled by either x v or x rp and columns corresponding to wire 
r], according to the 77th wire checker. 

We recall from Sect. 16.21 that for the wire checker of some wire to work, the 
vectors in U and V of this wire checker must have dual orderings. To keep 
notation simple, we will not mention this in what follows. 

Theorem 2. Let 0 > 2. Assume that Cbnd ls the circuit, obtained by the 
transformation described in Thm. 0 (including the added dummy gates). For 
r) G -E(Cbnd); denote D* = 2D^ — 1. Let d G- £)£>*. We obtain the AWC Q aw c 
by merging wire checkers for the individual wires rj G E(C^ n f) as described above. 
Proof. Let to be the size of the AWC (see Thm. 0). If a, b are consistent assign- 
ments, then their restrictions to , 0 _1 (a; r) ) U 0 _1 (:r, 7 ) are consistent assignments 
of the 77th wire. For every rj G FI (Cbnd); the 77th wire checker guarantees that 
E2u a i u ij)(J2iLi t>Wij) = 0, for columns j corresponding to this wire, iff the bit 
assignments of the 77th wire are consistent. Thus, (SHi a i u ij)(J2iLi bi v ij) = 0 
for j G [1, d) iff the bit assignments of all wires are consistent. □ 

Theorem 3. Let 0* := 1/(0 — 1). Assume C implements f : {0, l} n — > {0, 1}, 
and s = \C\. Then siz e(Q awc ) < (6 + 4 0*)s - (2 + 8 0*)n - 4, sdim(Q awc ) < 
(12 + 80* )s - (6 + 160*)n - 8, sdeg(Q awc ) < (9 + 60*)s - (4 + 120*)n - 6, 
supp(Q awc ) < 4(0 -(- 1) 2 ((1 + 0*)s + (4 — 2 0*)n — 1). 7/0 = 3, then siz e(Q awc ) < 
8s — 6n — 4, sdim(Q awc ) < 16s — 14n — 8, sdeg(Q awc ) < 12s — lOn — 6, and 
supp(Q awc ) < 72s — 68n — 36. 

Clearly, other parameters but support are minimized when 0 is large. If support 
is not important, then one can dismiss the bounding fan-out step, and get size 
2s, dimension 12s, and degree 9s. 

Like in the case of wire checkers, m constructed a weak AWC that guaran- 
tees the required “no double assignments” property only if the individual gate 
checkers are conscientious. The new AWC does not have this restriction. The 
size of the weak AWC from M is 24s and the degree of it is 76s. 

7 Circuit Checker 

Next, we combine the aggregate gate and wire checkers into a circuit checker, that 
can be seen as a reduction from Circuit-SAT to QSP-SAT. Circuit checker was 
called a canonical quadratic span program in m- Since [18] introduced canonical 
span programs in a completely different context, we changed the terminology. 

Let C be a circuit, and let P w = (0, 0, U w . F w , ip) be an AWC for Cbnd- 
Let Ps = ( u 0 ,U&,q ) be an AGC for Cbnd- Let P d s ual = (u 0 , P s , £>duai) be the 
corresponding dual span program. As before, V s = {7| ual and V w = U™ ual , and 
q and 0 are related as in Sect. 16.31 Let m g = size(P w ) = size(P s ) = size(P d s ua ,). 
Assume that U w = (u™, . . . , v™ n } and U e = (itf , . . . , } (and similarly, 

y w = {v'f , . . . , v™ n } and V g ) are ordered consistently (see Sect. 16.31) . 
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Definition 3. For m g = siz e(P s ), d g = sdim(P g ) and d w = sdim(P w ), define 
the circuit checker to be the QSP ca(C) = (u o, vq, U, V, o), where 



Here, U = (m, . . . , u m ) T , V = U dua , = (m, . . . , v m ) T . 

Recall that we denoted by c2q that computed the witness a of the AGC from 
w. We also denote (a, b) <— c2q(io), given that b is the dual of a. 

Theorem 4. Let w G {0, l} Se . C(w) = 1 iff CA(C)(c2q(w)) = 1. 

Proof. Clearly, CA(C)(a,b) = 1 iff P s , P d s ua , and P w all accept with the same 
witness (a, 6): (i) (£™ \ a i u % ~ «<y)(0 — 1) = 0 for j G [d g ] iff YhLi a i u % = u o j 
for j G [d g ] iff J2iLi a i u i = u o , (ii) (0 - l)(E<=i “ u o j) = 0 for j G 

[d g ] iff hvfj = u 0 j for j G [d g ] iff = u 0 , (iii) (£2=1 ' 

{J2iL i kv%) = 0 for j G [d w ]. A 

Assume C(w) = 1. By the construction of P s , there exists a G F m , with 
a, = 0 for i £ ipfv 1 , s.t. a T • U g = u 0 . Let b <— a, then also b T ■ V s = uo- Since 
a and b are consistent bit assignments in the evaluation of C(w), P w accepts. 

Second, assume that there exist (a, b ), s.t. Cyi(C)(a, b) = 1. Since P w accepts, 
there are no double assignments. That means, that for each r], for some (possibly 
non-unique) bit w v G (0, 1} and all i G ip~ l (x'r^ ), ai = 0. Dually, bi = 0 for all 
i G '^duaiC*™’’) i w n clearly has to be the same in both cases). Since this holds for 
every wire, there exists an assignment w of input values, s.t. for all i £ and 
j & (V’duai)' 11 ” a i = k = 0. Moreover, C(w) = 1. □ 

We will explain in the full version how the parameters of Q := ca(C) influence 
the efficiency of the Circuit-SAT NIZK argument. For example, the support of 
Q affects the prover’s computation, while its degree d affects the CRS length but 
also the prover’s computation and the security assumption. More precisely, the 
prover’s computation of the non-adaptive NIZK argument is <9(supp(Q)+ddog d) 
non-cryptographic operations and 0(d) cryptographic operations. One should 
choose 0 such that the prover’s computation will be minimal. This value depends 
on the constants in O. For simplicity, we will consider the case <f> = 3. 

Theorem 5. Let s = |C| and Q := C/t(Cb n d)- Let (j) be the fanout of Ctmd; 
and (j)* = 1/(0 - 1). Then sdeg(Q) < (17 + lO0*)s - (6 + 2O0*)n - 6, 
supp(Q) < (50 + 80(3 + 0) +4O0*)s + 2(— 13 + 80(3 + 20) — 4O0*)n — 8(1 + 0) 2 , 
and size(Q) < size(P w ) + size(P g ) < 2(7 + 40*)s- (8 + 160*)n-4. 7/0 = 3, then 
sdeg (Q) < 22s— 16n— 6, size(Q) < 18s— 16n— 4, and supp(Q) < 214s+366n— 128. 

The degree of the circuit checker from 03 is 130s and its size is 36s. Thus, even 
when 0 = 3, we have improved on their construction about 6 times degree- wise 
and 2 times size-wise. The QSP-SAT witness (a, b) can be computed in linear 
time 0(s) by using the algorithm c2q. 
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8 Two-Query Linear PCP for Circuit-SAT 

In Thm. 01 we presented a reduction from Circuit-SAT to QSP-SAT. That 
is, we showed that if for some w , C(w') = 1, then one can efficiently construct a 
witness (a, b ) = c2q(m) such that ca(C)(o, b) = 1. In this section, we construct 
a two-query non-adaptive linear PCP [5] for ClRCUIT-SAT. In the rest of the 
paper, we modify this to succinct three-query non-adaptive linear PCP, to a non- 
adaptive linear interactive proof and finally to a non-adaptive non-interactive 
zero knowledge argument. Here, non-adaptivity means that the query algorithm 
(in the linear PCP and linear interactive proof) or the CRS generation algorithm 
(in the NIZK argument) may depend on the statement C. 

Let 1Z = {(C, w)} be a binary relation, F be a finite field, V\ pcp be a determin- 
istic prover algorithm and Vi pcp = (Qi pcp - P| P c P ), where Qi pcp is a probabilistic 
query algorithm and T>\ pcp is an oracle deterministic decision algorithm. The pair 
(Pipcpj Vi pcp ) is a non-adaptive k-query linear PCP [2] for 7 Z over F with query 
length m if it satisfies the following conditions. 

Syntax: on any input C and oracle 7T, the verifier Vi pcp works as follows. 
Q\ pcp (C) generates k queries qi,...,qk € F' m to it, and a state informa- 
tion st. Given k oracle answers z\ <— {tt, qfi/, . .., <— {rr,qk), such that 

z = {z i, • . . , Zk), 'D£ cp (st; w) = 2?i pC p(st, z; w) accepts or rejects. 
Completeness: for every (C, w) e 71, the output of V\ pcp {C,w) is a descrip- 
tion of a linear function it : F m — > F such that 2?£ cp (st; w) accepts with 
probability 1. 

Knowledge: there exists a knowledge extractor Aj pcp , such that for every linear 
function tt* : F m — > F: if the probability that V| p * p (C) accepts is at least e, 
then A|^* p ((7) outputs w such that ( C,w ) 6 1Z. 

O^ipcp, Vipcp) has degree (dg,dx>), if Qi pC p (resp., V i pcp ) can be computed by an 
arithmetic circuit of degree cig (resp., d-p). 

We remark that in the following non-adaptive linear PCP, T>\ pcp does not 
depend on w. 

Theorem 6. Let F be a field, and let C be a circuit with dummy gates. Let P| p 2 cp 
and V lp 2 cp = (Qgp.Ljgp) be as follows: 

Qipcp ( C ') : Q c a{C); m <- size(Q); q u <- (m, 0 ro )^ 1 ; q v <- (O m ,u i )|T 1 ; 

q t— ( q u , q v ); st <— (no vq); return (q, st); 

^cpCQ- w ) : Q 4- °a(C); (t t u , 7 r„) = (a, b) <- c2q (w); return tt = (tt u , tt v ); 
^(st, (z u , z v );w): if ( z u — n 0 ) ° ( z v — v 0 ) = 0 then return 1 else return 0; 
(P| p 2 cp , V| p 2 cp ) is a non-adaptive 2-query linear PCP for Circuit-SAT with query 
length 2 md and knowledge error 0. 

Proof. Completeness: Clearly, z u <- {tt, q u ) = °i u i- z v <- (tt ,q v ) = 
J2iLi biVi. Thus, z u — u 0 = a T ■ U — u 0 and z v — v 0 = b T ■ V — v 0 , and the 
circuit checker accepts. 

Knowledge property: Due to the construction of Q^ p , z u = Y^Li a iUi, 
and z v = hvi. If X>| ppp accepts, then by Thm. 01 the wire checker implies 
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that no wire q gets a double assignment. However, it may be the case that some 
wire has no assignment. Nevertheless, on input (st, C) and access to the oracle 
77*, we will now extract a Circuit-SAT witness w = (w^L^ (i.e., the vector 
of wire values) such that C(w ) = 1. 

First, the extractor obtains the whole linear function 77* = (a, b), by querying 
the oracle 77* up to 2m times. We deduce w from 77* as follows. 

Let 7] be any wire of the circuit C. Since the wire checker accepts, the gate 
checkers of its neighbouring gates do not assign multiple values to the wire rj. 
There are two different cases. 

If r/ is an input wire to the circuit, then its output gate i is a conscientious 
dummy gate. Therefore, the value w v can be extracted from the local values of 
a,i corresponding to the gate i. 

Assume that r] is an internal wire. Since all gates implement functions with 
well-defined outputs, the gate checker of the input gate of rj assigns some value 
w v to this wire. Moreover, every output gate i of r] either assigns the same value 
w v or does not assign any value. In the latter case, the output value of t does not 
depend on w v , and thus assigning w v to rj is consistent with the output value 
of l. Therefore, also here the value vj v can be extracted, but this time from the 
local values of and 6* corresponding to the input gate of r/. □ 

A simple corollary of this theorem is that the algorithm c2q is efficiently invert- 
ible. Thus, the constructed NP-reduction from Circuit-SAT to QSP-SAT 
preserves knowledge (i.e., it is a Levin reduction). 

Note that the communication and computation can be optimized by defining 
q u <— (iti)^, q v i, and computing say z u <— (tt u , q u ). 

9 Succinct 3-Query Linear PCP from Polynomial QSPs 

Since we are interested in succinct arguments, we need to be able to compress the 
witness vectors a and b. As in ns , we will do it by using polynomial interpolation 
to define polynomial QSPs. We employ the Schwartz- Zippel lemma to show that 
the resulting succinct 3-query linear PCP has the knowledge property. 

9.1 Polynomial Span Programs and QSPs 

Instead of considering the target and row vectors of a span program or a QSP as 
being members of the vector space F d , interpret them as degree- (ri — 1) polyno- 
mials in F[X], The map u u(X) is implemented by choosing d different field 
elements (that are the same for all vectors u) rj <— F, and then defining a degree- 
(< d — 1) polynomial u(X) via polynomial interpolation, so that u(rj) = Uj for 
all j G [d]. This maps the vectors Ui of the original span program P to poly- 
nomials Ui(X), and the target vector u 0 to the polynomial uo(X). Finally, let 
Z(X) := UU( X — r j)‘‘ this polynomial can be thought of as a mapping of the 
all-zero vector 0 = (0, . . . , 0). 

The choice of rj influences efficiency. If rj are arbitrary, then multipoint eval- 
uation and polynomial interpolation take time 0(d log 2 d) [T2] . If d is a power of 
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2 and rj = uif where 0 Jd is the dth primitive root of unity, then both operations 
can be done in time Oid log d) by using Fast Fourier Transform jTS] . In what 
follows, d and rj are chosen as in the current paragraph. 

Clearly, u 0 is in the span of the vectors that belong to gff iff u 0 = 
£ ig -i a i u i f° r some a* £ F. The latter is equivalent to the requirement 
that Z(X) divides u(X) := £ iEe -i a^X) - u 0 {X). Really, u 0 is the vec- 
tor of evaluations of uo{X), and Ui is the vector of evaluations of ufiX). Thus, 
£ a,Ui — M 0 = 0 iff £ a iUi(X) — iio{X) evaluates to 0 at all rj, and hence is 
divisible by Z(X). 

A polynomial span program P = (uo, U, g) over a field F consists of a target 
polynomial Uq(X) £ F[X], a tuple U = (ui(X))'^L 1 of polynomials from F[X], 
and a labelling g : [to] —> {x,., x L : b £ [n]} U {T} of the polynomials from U. Let 
U w be the subset of U consisting of those polynomials whose labels are satisfied 
by the assignment w £ {0,1}", that is, by {xf L : l £ [n]} U {T}. The span 
program P computes a function /, if for all w £ {0, 1}": there exists a £ F m 
such that Z(X) | (u 0 (X) + Eueu w ^u(X)) (P accepts) iff f(w ) = 1. 

Alternatively, P accepts w £ (0, 1}" iff there exists a vector a £ F m , with 
at = 0 for all i # g~\ s.t. Z(X) | £™ a a;U;(X) - &o(X). The size of P is m and 
the degree of P is degZ(X). 

Definition 4. A polynomial QSP Q = (uo, i)o, U, V, q) over a field F con- 
sists of target polynomials uo(X) £ F[X] and vo(X) £ F[X], two tuples 
U = (ui(X))‘!£L 1 and V = (vi(X)) d L 1 of polynomials from F[X], and a la- 
belling g : [to] -> {x L ,x L : t £ [n] } U {T}. Q accepts an input w £ {0,1}" 
iff there exist two vectors a and b from F m , with ai = 0 = bi for all i £ g~ x , 
s.t. Z(X) I (EZi “MX) ~ MX)) (EZi b MX) - U 0 (X)). Q computes a 
Boolean function / : {0, 1}" — >■ {0, 1} if Q accepts w iff f(w) = 1. 

The size of Q is m and the degree of Q is deg Z(X). Keeping in mind the 
reinterpretation of span programs, Def. 0] is clearly equivalent to Def. Q] (Also 
here, V = f/d U ai, with the dual operation defined appropriately.) 

To get from the linear-algebraic interpretation to polynomial interpretation, 
one has to do the following. Assume that the dimension of the QSP is d and that 
the size is to. Let rj ui J d , j £ [d]. For i £ [to], interpolate the polynomial ufiX) 
(resp., Vi(X)) from the values Ui( r 3 ) = u ij ( r esp., vfirj) = Vij) for j £ [d]. Set 
Z(X) := U (X — rj). The labelling tlj is left unchanged. It is clear that the 
resulting polynomial QSP (uo,vo,U,V,ip) computes the same Boolean function 
as the original QSP. 

The polynomial circuit checker c^ oly (C) = (uo,vo,U,V,i/j), with U = 
(u o, . . . , u m ) and V = (Do, • • • , v m ), is the polynomial version of ca(C). 

Theorem 7. Let w £ {0,1}". C{w) = 1 iff c p / y (C)(c2 q(«j)) = 1. 

Proof. Follows from Thm. 0] and the construction of polynomial QSPs. □ 
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9.2 Succinct Three-Query Linear PCP 

To achieve better efficiency, following [2], we define a 3-query linear PCP with 
\z\ = 0(1) that is based on the polynomial QSPs. For a set V of polynomials, let 
span('P) be their span (i.e., the set of F-linear combinations). Then, u is in the 
span of vectors Ui,u = a i u i, iff the corresponding interpolated polynomial 
u(X) is in the span of polynomials iii(X), i.e., u(X) = a iUi(X). 

Let F be any field. We recall that according to the Schwartz-Zippel lemma, 
for any nonzero polynomial / : F m — > F of total degree d and any finite subset 
S of F, P w sm [/(x) = 0] < d/\S\. 

Theorem 8. Let F be a field, and C a circuit with dummy gates. Let V\ p l p and 
V|p 3 C p = (Qipcp) ®ipcp) & e as follows. Here, Polylnt is polynomial interpolation. 
Q\pl p (C) : Q ca(C); to 4- size(Q); d 4- sdeg(Q); For i <- 1 to d do: n 4- u l d ; 
o 4— r F; Compute (a l )f~Q,- Z(o) 4— n^=i — r j)i Compute (wj(cr))£L 0 , 
(vi(a))VL 0 ; st 4- (Z(a),u 0 (a),v 0 (c r)); q u 4- (((^(et))^, 0 m , 0 d ); q v 4- 
(Om>(M cr ))%v°d) Qh 4- (Om,Om,(^)to); 9 (Qu,Qv,qh); return 

(«> st); 

V™(C,w): Compute (Q,m,(ri)f =1 ) as in Q\ p l p (C); ( a,b ) 4— c2q(m); 4— 

«o + YmLi a i u i> 4- Polylnt((rj,ut)^ =1 ); v * 4- v 0 + Y^Li a i v H 

&(X) 4- Polylnt((rj, u|)^ =1 ); Z(X) <- Ylti( X ~c); h(X) = £fo ^ 
tf(X)tf(X)/Z(X) 6 F d ~ 2 ; return n = (tt u , n v , n h ) 4- (a, b, h) e F 2m+d ; 
£ , ip 3 C p( s t, (z u , z v , z h ); w): if (z u - u 0 (a )) • (z v - v 0 (a)) = Z(a) ■ z h then return 1 
else return 0; 

(P|p 3 C p, V|p 3 C p) is a non-adaptive 3-query linear PCP over F for Circuit-SAT with 
query length 2 to + d and knowledge error 2d/|F|. 

Proof. Completeness: again straightforward, since z u = u w (a) 4— (tt. q u ) = 
YZx a A(o-), Z v = v(a) 4- ( 7 r, q v ) = Mi (o’), and z h = K a ) 4- {n, qn) = 
Yli=o hicr 1 . Knowledge: assume that the verifier accepts with probability 
e > 2d/|F|. That is, P w F [(£™ x aA(u) - a M") ~ W) = 

Z(a) ■ (YliZo hio 1 )} = e. Due to the Schwartz-Zippel lemma, since e > 2d/|F|, 
(E™i *MX)-u 0 (X))(YZx aM x )-MX)) = Z(X). (Etc 1 hiX% and due 
to the equivalence between QSPs and polynomial QSPs, Eq. (]T|) holds. The claim 
now follows from Thm. [5] □ 

Theorem 9. Assume d is a power of 2. V\ p } p runs in time 0(dlogd), runs 
in time 0(dlogd), and the time ofV is dominated by 2 F -additions and by 2 
F -multiplications. V^, has degree (d, 2). 

A similar result was proven in [TS] (though without using the terminology of 
linear PCPs) in the case of conscientious gate checkers. We only require the 
dummy gates to be conscientious. 

In [T5] , it was only shown that h(X) can be computed by using multipoint 
evaluation and polynomial interpolation in time <9(dlog 2 d). Moreover, the com- 
putation of V was <9(n) due to a different extraction technique. 
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10 From Non-Adaptive Linear PCP to Adaptive NIZK 

Given the 3-query linear PCP of Thm. [SJ one can use the transformation [2] to 
construct first a non-adaptive NIZK argument for Circuit-SAT. See the full 
version. The The non-adaptive NIZK argument can be made adaptive by using 
universal circuits |25j . see PH for details. 

We will provide more details in the full version [2D]. There, we will also provide 
a direct construction of the non-adaptive NIZK argument. The latter has a (quite 
complex) soundness proof related to the soundness proof from [TS] that results 
in the use of a weaker security assumption. Here, we state only the following 
straightforward corollary of Thm. [D] and the transformations from [2] . 

Theorem 10. Assume d is a power of 2. There exists a non-adaptive NIZK 
Circuit-SAT argument, s.t. the prover and the CRS generation take 0(d\ogd) 
cryptographic operations, the verification time is dominated by 0(1) pairings, 
and the communication is a 0(1) group elements. 
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Abstract. We construct new families of elliptic curves over F p 2 with 
efficiently computable endomorphisms, which can be used to accelerate 
elliptic curve-based cryptosystems in the same way as Gallant-Lambert- 
Vanstone (GLV) and Galbraith-Lin-Scott (GLS) endomorphisms. Our 
construction is based on reducing quadratic Q-curves (curves defined 
over quadratic number fields, without complex multiplication, but with 
isogenies to their Galois conjugates) modulo inert primes. As a first ap- 
plication of the general theory we construct, for every prime p > 3, two 
one-parameter families of elliptic curves over F p 2 equipped with endo- 
morphisms that are faster than doubling. Like GLS (which appears as a 
degenerate case of our construction), we offer the advantage over GLV 
of selecting from a much wider range of curves, and thus finding secure 
group orders when p is fixed. Unlike GLS, we also offer the possibility of 
constructing twist-secure curves. Among our examples are prime-order 
curves over F p 2 , equipped with fast endomorphisms, and with almost- 
prime-order twists, for the particularly efficient primes p = 2 127 — 1 and 
p = 2 256 — 19. 

Keywords: Elliptic curve cryptography, endomorphisms, GLV, GLS, 
exponentiation, scalar multiplication, Q-curves. 


1 Introduction 

Let £ be an elliptic curve over a finite field F g , and let Q C £(F f; ) be a cyclic 
subgroup of prime order N. When implementing cryptographic protocols in Q , 
the fundamental operation is scalar multiplication (or exponentiation): 

Given P in Q and m in Z, compute [m]P := P 8 ■ ■ ■ 8 P . 

The literature on general scalar multiplication algorithms is vast, and we 
will not explore it in detail here (see [TUI §2.8, §11. 2] and [51 Chapter 9] for 
introductions to exponentiation and multiexponentiation algorithms). For our 
purposes, it suffices to note that the dominant factor in scalar multiplication 
time using conventional algorithms is the bitlength of m. As a basic example, 
if Q is a generic cyclic abelian group, then we may compute [m]P using a variant 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 61-f7H1 2013. 
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of the binary method, which requires at most [~ log 2 m] doublings and (in the 
worst case) about as many addings in Q. 

But elliptic curves are not generic groups: they have a rich and concrete ge- 
ometric structure, which should be exploited for fun and profit. For example, 
endomorphisms of elliptic curves may be used to accelerate generic scalar mul- 
tiplication algorithms, and thus to accelerate basic operations in curve-based 
cryptosystems. 

Suppose £ is equipped with an efficient endomorphism ip, defined over F g . By 
efficient, we mean that we can compute the image ip(P) of any point P in £(¥ q ) 
for the cost of 0(1) operations in F g . In practice, we want this to cost no more 
than a few doublings in £(F g ). 

Assume ip(G) C Q, or equivalently, that ip restricts to an endomorphism of £70 
Now Q is a finite cyclic group, isomorphic to Z/TVZ; and every endomorphism 
of Z/TVZ is just an integer multiplication modulo TV. Hence, ip acts on Q as 
multiplication by some integer eigenvalue A,/,: that is, 

4>\s = M# ■ 

The eigenvalue is a root of the characteristic polynomial of ip in Z/TVZ. 

Returning to the problem of scalar multiplication: we want to compute [m]P. 
Rewriting m as 

m = a + bXip (mod TV) 

for some a and b, we can compute [m]P using the relation 
[m]P = [a]P + [b\^]P = [a]P + \b]ip{P) 

and a two-dimensional multiexponentation such as Straus’s algorithm | 28 j . which 
has a loop length of log 2 ||(o, &)||oo (ie, log 2 ||(o, &)||oo doubles and as many adds; 
recall that || (a, 6 )||g>o = max(|o|, |6|)). If A $ is not too small, then we can easily 
find (a, b) such that log 2 || (a, b) || oo is roughly half of log 2 TV. (We remove the “If” 
and the “roughly” for our ip in 0) The endomorphism lets us replace conven- 
tional log 2 TV-bit scalar multiplications with \ log 2 TV-bit multiexponentiations. 
In terms of basic binary methods, we are halving the loop length, cutting the 
number of doublings in half. 

Of course, in practice we are not halving the execution time. The precise 
speedup ratio depends on a variety of factors, including the choice of exponenti- 
ation and multiexponentiation algorithms, the cost of computing ip, the shortness 
of a and b on the average, and the cost of doublings and addings in terms of 
bit operations — to say nothing of the cryptographic protocol, which may pro- 
hibit some other conventional speedups. For example: in m, Galbraith, Lin, 

1 This assumption is satisfied almost by default in the context of classical discrete log- 

based cryptosystems. If ip(G) g G , then £[TV](F,) = G + ip{G) — (Z/TVZ) 1 2 , so TV 2 | 
=fp£(¥ q ) and TV \ q — 1; such £ are cryptographically inefficient, and discrete logs in 
G are vulnerable to the Menezes-Okamoto-Vanstone reduction EQ- However, these 
G do arise naturally in pairing-based cryptography; in that context the assumption 
should be verified carefully. 
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and Scott report experiments where cryptographic operations on GLS curves re- 
quired between 70% and 83% of the time required for the previous best practice 
curves — with the variation depending on the architecture, the underyling point 
arithmetic, and the protocol. 

To put this technique into practice, we need a source of cryptographic elliptic 
curves equipped with efficient endomorphisms. To date, in the large character- 
istic case], there have been essentially only two constructions: 

1. The classic Gallant-Lambert-Vanstone (GLV) construction [15]. Here, ellip- 
tic curves over number fields with explicit complex multiplication (CM) by 
CM-orders with small discriminants are reduced modulo suitable primes p: 
an explicit endomorphism on the CM curve reduces to an efficient endomor- 
phism over the finite field. 

2. The more recent Galbraith-Lin-Scott (GLS) construction [TJ. Here, curves 
over F p are viewed over W p 2 ; the p-power sub-Frobenius induces an extremely 
efficient endomorphism on the quadratic twist (which can have prime order) . 

These constructions have since been combined to give 3- and 4-dimensional vari- 
ants |18I32| , and extended to hyperelliptic curves in a variety of ways [3117126155] . 
However, basic GLV and GLS remain the archetypal constructions. 

Our contribution: new families of endomorphisms. In this work, we propose a 
new source of elliptic curves over F p 2 with efficient endomorphisms: quadratic 
Q-curves. 

Definition 1. A quadratic Q-curve of degree d is an elliptic curve £ without 
CM, defined over a quadratic number field K, such that there exists an isogeny 
of degree d from £ to its Galois conjugate a £, where (a) = Gal(isf/Q)|l 

Q-curves are well-established objects of interest in number theory, where they 
formed a natural setting for generalizations of the Modularity Theorem. Ellen- 
berg’s survey [5] gives an excellent introduction to this beautiful theory. 

_ Our ^application of quadratic Q-curves is rather more prosaic: given a d-isogeny 
£ — > a £ over a quadratic field, we reduce modulo an inert prime p to obtain an 
isogeny £ — > a £ over F p 2 . We then exploit the fact that the p-power Frobenius 
isogeny maps a £ back onto £ ; composing with the reduced d-isogeny, we obtain 
an endomorphism of £ of degree dp. For efficiency reasons, d must be small; it 
turns out that for small values of d, we can write down one-parameter fami- 
lies of Q-curves (our approach below was inspired by the explicit techniques of 
Hasegawa [IS]). We thus obtain one-parameter families of elliptic curves over F p 2 
equipped with efficient non-integer endomorphisms. For these endomorphisms we 
can give convenient explicit formulae for short scalar decompositions (see Q. 

For concrete examples, we concentrate on the cases d = 2 and 3 (in ^5] and ;]I1 
respectively), where the endomorphism is more efficient than a single doubling 

2 We are primarily interested in the large characteristic case, where q = p or p 2 3 , so we 
will not discuss r-adic/Frobenius expansion-style techniques here. 

3 The Galois conjugate a £ is the curve formed by applying a to all of the coefficients 
of the defining equation of £ ; see 33 
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(we briefly discuss higher degrees in TTTfl . For maximum generality and flexibility, 
we define our curves in short Weierstrass form; but we include transformations 
to Montgomery, twisted Edwards, and Doche-Icart-Kohel models where appro- 
priate in m 

Comparison with GLV. Like GLV, our method involves reducing curves defined 
over number fields to obtain curves over finite fields with explicit CM. However, 
we emphasise a profound difference: in our method, the curves over number fields 
generally do not have CM themselves. 

GLV curves are necessarily isolated examples — and the really useful examples 
are extremely limited in number (see [18] App. A] for a list of curves). The 
scarcity of GLV curved is their Achilles’ heel: as noted in [TT] , if p is fixed 
then there is no guarantee that there will exist a GLV curve with prime (or 
almost-prime) order over F p . Consider the situation discussed in [TT] §1]: the 
most efficient GLV curves have CM discriminants —3 and —4. If we are working 
at a 128-bit security level, then the choice p = 2 255 — 19 allows particularly fast 
arithmetic in F p . But the largest prime factor of the order of a curve over F p 
with CM discriminant —4 (resp. —3) has 239 (resp. 230) bits: using these curves 
wastes 9 (resp. 13) potential bits of security. In fact, we are lucky with D = — 3 
and —4: for all of the other discriminants offering endomorphisms of degree at 
most 3, we can do no better than a 95-bit prime factor, which represents a 
catastrophic 80-bit loss of relative security. 

In contrast, our construction yields true families of curves, covering ~ p iso- 
morphism classes over F p 2 . This gives us a vastly higher probability of finding 
prime (or almost-prime)-order curves over practically important fields. 
Comparison with GLS. Like GLS, we construct curves over F p a equipped with 
an inseparable endomorphism. While these curves are not defined over the prime 
field, the fact that the extension degree is only 2 means that Weil descent attacks 
offer no advantage when solving DLP instances (see [TT1 §9]). And like GLS, our 
families offer around p distinct isomorphism classes of curves, making it easy to 
find secure group orders when p is fixed. 

But unlike GLS, our curves have j-invariants in F p 2 : they are not isomorphic to 
or twists of subfield curves. This allows us to find twist-secure curves, which are 
resistant to the Fouque-Lercier-Real-Valette fault attack [9] . As we will see in 
our construction reduces to GLS in the degenerate case d = 1 (that is, where 

4 The scarcity of useful GLV curves is easily explained: efficient separable endomor- 
phisms have extremely small degree (so that the dense defining polynomials can be 
evaluated quickly). But the degree of the endomorphism is the norm of the corre- 
sponding element of the CM-order; and to have non-integers of very small norm, 
the CM-order must have a tiny discriminant. Up to twists, the number of elliptic 
curves with CM discriminant D is the Kronecker class number h(D ), which is in 
0(VD). Of course, for the tiny values of D in question, the asymptotics of h(D) 
are irrelevant; for the six D corresponding to endomorphisms of degree at most 3, 
we have h(D) = 1, so there is only one j-in variant. For D = —4 (corresponding to 
j = 1728) there are two or four twists over F p ; for D = — 3 (corresponding to j = 0) 
we have two or six, and otherwise we have only two. In particular, there are at most 
18 distinct curves over F p with a non-integer endomorphism of degree at most 3. 
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(j) is an isomorphism). Our construction is therefore a sort of generalized GLS — 
though it is not the higher-degree generalization anticipated by Galbraith, Lin, 
and Scott themselves, which composes the sub-Frobenius with a non-rational 
separable isogeny and its dual isogeny (cf. [TTJ Theorem 1]). 

In fJU we prove that we can immediately obtain scalar decompositions of the 
same bitlength as GLS for curves over the same fields: the decompositions pro- 
duced by Proposition [5] are identical to the GLS decompositions of [Til Lemma 2] 
when d = 1, up to sign. For this reason, we do not provide extensive imple- 
mentation details in this paper: while our endomorphisms cost a few more F co- 
operations to evaluate than the GLS endomorphism, this evaluation is typically 
carried out only once per scalar multiplication. This evaluation is the only dif- 
ference between a GLS scalar multiplication and one of ours: the subsequent 
multiexponentiations have exactly the same length as in GLS, and the underly- 
ing curve and field arithmetic is the same, too. 

2 Notation and Conventions 

Throughout, we work over fields of characteristic not 2 or 3. Let 
£ : y 2 = x 3 + a±x + ae 
be an elliptic curve over such a field K. 

Galois conjugates. For every automorphism a of K , we define the conjugate 
curve 

a £ : y 2 = x 3 + a a 4 X + a ae- 

If <j> : £ £i is an isogeny, then we obtain a conjugate isogeny a (j> : a £ — >• a E\ 
by applying u to the defining equations of <p, £, and £\. 

Quadratic twists. For every A ^ 0 in K, we define a twisting isomorphism 

5(A) : £ — > £ x : y 2 = x 3 + X^a^x + A 6 a6 


by 

<5(A) : {x,y) i > (X 2 x,X 3 y) . 

The twist £ x is defined over K( A 2 ), and 5(A) is defined over A'(A)H 

For every AT-endomorphism ip of £ , there is a twisted Af(A 2 )-endomorphism 

<P X := 5(A)#(A“ 1 ) 

of £ x . Observe that 5 (Ai)5(A2) = 5(AiA 2) for any Ai, A 2 in K, and 5(— 1) = [—1]. 
Also, a (£ x ) = ( CT £) A for all automorphisms 0 of K. 

If y, is a nonsquare in K, then £\/F is a quadratic twist of £. If K = F,, 
then and are F g -isomorphic for all nonsquares /Ji , /Z 2 in F g (the 

isomorphism 5(^//U 1 //U 2 ) is defined over ¥ q because ji\/ y .2 must be a square). 


Throughout, conjugates are marked by left-superscripts, twists by right-superscripts. 
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When the choice of nonsquare is not important, £' denotes the quadratic twist. 
Similarly, if ip is an F g -endomorphism of £, then ip' denotes the corresponding 
twisted F g -endomorphism of £' . 

The trace. If K = F g , then ns denotes the (/-power Frobenius endomorphism 
of £. Recall that the characteristic polynomial of ns has the form 

Xs(T) = T 2 - ti(£)T + q, with |tr(£)| < 2y/q . 

The trace tr(£) of £ satisfies #£(F g ) = q+ 1 — tr(£) and tr(£') = — tr(£). 
p-th powering. We write (p) for the p-th powering automorphism of F p . Note 
that (p) is almost trivial to compute on F p2 = F P (y/A), because a + b\/~A) = 

a — b\f~A for all a and b in F p . 

3 Quadratic Q-curves and Their Reductions 

Suppose £/Q(y/A) is a quadratic Q-curve of prime degree d (as in Definition [T]) , 
where A is a discriminant prime to d, and let <p : £ — > a £ be the corresponding 
d-isogeny. In general, (p is only defined over a quadratic extension Q(y/A, 7) of 
Q(\/A). We can compute 7 from A and ker cp using [13l Proposition 3.1], but 
after a suitable twist we can always reduce to the case where 7 = y/±d (see [TUI 
remark after Lemma 3.2]). The families of explicit Q-curves of degree d that 
we treat below have their isogenies defined over Q(>/2, V~d)', so to simplify 
matters, from now on we will 

Assume cp is defined over Q(\/A, yf—d). 

Let p be a prime of good reduction for £ that is inert in Q(\/A) and prime to 
d. If O a is the ring of integers of Q(\/A), then 

V = O a /(p) = F P (VA) . 

Looking at the Galois groups of our fields, we have a series of injections 

((p)) = Gal(Wp(y/A)/W p ) Gal(Q(VZ)/Q) Gal(Q(>/Z, V^d)/ Q) • 

The image of (p) in Gal(Q(\/A)/Q) is a, because p is inert in Q(-\/A). When 
extending a to an automorphism of Q (y/A, V~d), we extend it to be the image 
of (p): that is, 


a (a + pVA + 7 V^d + SV^dA^j = a-pVA+ (-d/p) (7 yf^d - SV^dA^j (1) 

for all a,0, 7, and 6 e Q. (Recall that the Legendre symbol (n/p) is 1 if n is a 
square mod p, —1 if n is not a square mod p, and 0 if p divides n.) 

Now let £ /F p 2 be the reduction modulo p of £. The curve a £ reduces to ^£. 
while the d-isogeny (p : £ —> a £ reduces to a d-isogeny 4 > : £ — t ^£ over F p2 . 
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Applying cr to <f>, we obtain a second d-isogeny * <f> : ** £ -} £ travelling in the 
opposite direction, which reduces mod p to a conjugate isogeny — >• £ 

over F p 2 . Composing a <p with <j> yields endomorphisms a P o <p of £ and p o a p 
of a £ , each of degree d' 2 . But (by definition) £ and a £ do not have CM, so all 
of their endomorphisms are integer multiplications; and since the only integer 
multiplications of degree d 2 are [d] and [— d], we conclude that 

a pop = [e p d\^ and po a p = [e p d]sg , where e p E {±1} • 

Technically, a P and are — up to sign — the duaHsogenies of p and p, respec- 

tively. The sign e p depends on p (as well as on <j>): if r is the extension of a 
to Q(VA, V—d) that is not the image of (p), then T po p = [— e p d]^. Reducing 
modulo p, we see that 

^p o p = [e p d\ £ and p o Wp = [e p d\ M£ . 

The map ( x,y ) ( x p ,y p ) defines p-isogenies 

7T 0 : {p) £ — > £ and ( p) 7r 0 : £ — > {p) £ . 

Clearly, ( p \o °^o (resp. 7To o ^no) is the p 2 -power Frobenius endomorphism of £ 
(resp. Composing 7To with p yields a degree-pd endomorphism 

p := 7To o p e End(£) . 

If d is very small — say, less than 10 — then ip is efficient because p is defined by 
polynomials of degree about d, and no acts as a simple conjugation on coordinates 
in F p 2 , as in Eq. ([T]). (The efficiency of ip depends primarily on its separable 
degree, d, and not on the inseparable part p.) 

We also obtain an endomorphism ip' on the quadratic twist £' of £. Indeed, 
if £' = £'f p , then ip' = ip VF, and ip' is defined over F p 2 . 

Proposition 1. With the notation above: 

ip 2 = [e p d\ns and (ip 1 ) 2 = [-e p d\n£>. 

There exists an integer r satisfying dr 2 = 2p + e p tr(£) such that 

ip = 1 (n £ + e p p) and ip' = (7 tgr - e p p) . 

The characteristic polynomial of both ip and ip' is 

Pp{T) = Pp'(T) = T 2 - e p rdT + dp . 

Proof. Clearly no ° (p = ^p ° ^no, so 

ip 2 = n 0 <pno<p = nocp^cp^no = 7r 0 [e p d] (p) 7r 0 = [e p d]7r 0 (p V 0 = [ e p d\n £ . 
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Choosing a nonsquare p in F p 2 . so £' = £'^ J and i ft = i/j'fi 1 , we find 

W? = = *( M *)MM(/x"*) 

= 5(/x^ (1_p2) )[e p d] 7T f / = S(-l)[e p d]n£- = [-e p d]7T f / . 

Using 7 r| — tr(£ )n£ + p 2 = 0 and 7r|, + tr(£)7 T£t 4* p 2 =0, we verify that the 
expressions for and i ft give the two square roots of e p d-K£ in Q(7T£), and —e p d-K£> 
in <Q (tt£- ) , and that the claimed characteristic polynomial is satisfied. □ 

Now we just need a source of quadratic Q-curves of small degree. Elkies [7] shows 
that all Q-curves correspond to rational points on certain modular curves: Let 
X*{d) be the quotient of the modular curve Xo(d) by all of its Atkin-Lehner 
involutions, let K be a quadratic field, and let a be the involution of K over Q. 
If e is a point in X*(d)( Q) and E is a preimage of e in X 0 (d)(K) \ X 0 (d)(Q), 
then E parametrizes (up to Q-isomorphism) a ri-isogeny cj) : £ — )■ a £ over K. 

Luckily enough, for very small d, the curves X 0 (d) and X*(d) have genus 
zero — so not only do we get plenty of rational points on X*(d), we get a whole 
one-parameter family of Q-curves of degree d. Hasegawa gives explicit universal 
curves for d = 2, 3, and 7 in jTo] Theorem 2.2]: for each squarefree integer A / 1, 
every Q-curve of degree d = 2,3,7 over Q(\/A) is Q-isomorphic to a rational 
specialization of one of these families. Hasegawa’s curves for d = 2 and 3 (£-2 ,a,s 
in ;j5] and £:i,a,s in ® suffice not only to illustrate our ideas, but also to give 
useful practical examples. 

4 Short Scalar Decompositions 

Before moving on to concrete constructions, we will show that the endomor- 
phisms developed in ]J5] yield short scalar decompositions. Proposition [7] below 
gives explicit formulae for producing decompositions of at most [ log 2 p] bits. 

Suppose G is a cyclic subgroup of £ (F p 2 ) such that U(5) = G'- let N = #G- 
Proposition [T] shows that V ; acts as a square root of e p d on Q\ its eigenvalue is 

= (1 + e P p)/r (mod N) . (2) 

We want to compute a decomposition 

m = a + bXtj, (mod N) 


so as to efficiently compute 

[m]P = [a]P + [bX^P = [a]P + [b\i>(P) . 

The decomposition of m is not unique: far from it. The set of all decomposi- 
tions (a, b) of m is the coset (to, 0) + £, where 


C := ((N, 0), (— A.0, 1)) C Z 5 
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is the lattice of decompositions of 0 (that is, of (a, b) such that a + b\^ = 0 
(mod N)). 

We want to find a decomposition where a and b have minimal bitlength: that 
is, where [~log 2 || (a, t>) || ool is as small as possible. The standard technique is to 
(pre)-compute a short basis of £, then use Babai rounding [I] to transform each 
scalar m into a short decomposition (a, b). The following lemma outlines this 
process; for further detail and analysis, see [121 §4] and [TUI §18.2]. 

Lemma 1. Let ei, e2 be linearly independent vectors in £. Let m be an integer, 
and set 

(a,b) := (m, 0) - |_a~|ei - [/3]e 2 , 

where ( a, [3 ) is the (unique) solution in Q 2 to the linear system (m, 0) = aei + 
f)e 2 . Then 


m = a + X^b (mod N) and ||(a, 6)||oo < max (||©ij|o 0 , ||e2||oo) • 
Proof. This is just [T21 Lemma 2] (under the infinity norm). □ 

We see that better decompositions of m correspond to shorter bases for £. If jA, ;j | 
is not unusually small, then we can compute a basis for £ of size 0{\fN) using 
the Gauss reduction or Euclidean algorithms (cf. [HJ §4] and [TUI §17.1.1])H The 
basis depends only on N and A,/, , so it can be precomputed. 

In our case, lattice reduction is unnecessary: we can immediately write down 
two linearly independent vectors in £ that are “short enough”, and thus give 
explicit formulae for (a, b) in terms of m. These decompositions have length 
[log 2 p] , which is near-optimal in cryptographic contexts: if N ~ #£(F p2 ) ~ p 2 , 
then log 2 p ~ 5 log 2 N. 

Proposition 2. With the notation above: given an integer m, let 

a = m- [m( 1 + e p p) /#£ (F p2 )] (1 + e p p) + [ mr / #£ (F p2 )] e p dr and 

b = [m(l + e p p) / #£ (F p2 )] r — Lmr/#£(F p2 )] (1 + e p p) . 

Then, assuming d<^p and m ^ 0 (mod N), we have 

m = a + bX g, (mod N) and [log 2 ||(a, £>)||ool < [log 2 p] . 

Proof. Eq. Q yields rA,/, = 1 + e p p (mod N) and re p d = (1 + e p p)Xp (mod N), 
so ei = (l+e p p, — r) and e2 = (—e p dr, l+e p p) are in £ (they generate a sublattice 
of determinant #£(F p2 )). Applying Lemma[l]with a = m(l + e p p)/#£(F p2 ) and 
/3 = mr/#£(F p2 ), we see that m = a + bX $ (mod N) and |[ (a, 6) ||oo < ||e2||oo- 
But d|r| < 2 yfdfp (since |tr(£)| < 2 p) and d -C p, so ||e2 Hoe. = P + e p . The result 
follows on taking logs, and noting that [ log 2 (p ± 1)] < [log 2 p] (since p > 3). □ 

6 General bounds on the constant hidden by the O(-) are derived in [26], but they are 
suboptimal for our endomorphisms in cryptographic contexts, where Proposition [5] 
gives better results. 
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5 Endomorphisms from Quadratic Q-curves of Degree 2 


Let A be a squarefree integer. Hasegawa defines a one-parameter family of elliptic 
curves over Q(7Z) by 

fa ,A, S '• y 2 = x 3 — 6(5 — 3 sVA)x + 8(7 — 9 sVA) , (3) 

where s is a free parameter taking values in Q [T3] Theorem 2.2]. The discrimi- 
nant of fa, a, a is 2 7 * 9 • 3 6 (1 — s' 2 A) (l + sy/A), so fa, a, a has good reduction at every 
p > 3 with ( A/p ) = —1, for every s in Q. 

The curve £ 2 , a, a has a rational 2-torsion point (4,0), which generates the 
kernel of a 2-isogeny fa, a, a ■ fa, a, a — > 17 fa, A, s defined over Q(\/Z, 7 — 2). We 
construct fa, a, a explicitly: Velu’s formulae [3D] define the (normalized) quotient 
£ 2 , a, a — ► fa,A, s / ((^,9)) , and then the isomorphism fa, a, a/ ((^,9)) — > a fa,A,a 
is the quadratic twist 5(1/7— 2)- Composing, we obtain an expression for the 
isogeny as a rational map: 


fa ,A,t ■ (x,y) 


f—x 9(1 + s\/A) y ( — 1 9(1 + s7Z)\\ 

+ (x - 4) 2 J ) ' 


Conjugating and composing, we see that a fa,A,tfa,A,t = [2] if cr(7~ 2) = — 7—2- 
and [—2] if cr(7— 2) = 7~ 2 ; that is, the sign function for fa,A,t is 



if p = 5, 7 (mod 8) , 
if p = 1 , 3 (mod 8) . 


(4) 


Theorem 1. Let p > 3 be a prime, and define e p as in Eq. (gj. Let A be a 
nonsquar^ in ¥ p , so F p 2 = ¥ p (y/A). For each s in F p; let 

C 2 ,a(*) ■= 9(1 + sVA) 

and let £ 2 , A, a he the elliptic curve over F p 2 defined by 

fa, A, a :y 2 =x 3 + 2(C2 ,a(s) ~ 24)z - 8 (C 2 ,a(s) - 16) . 


Then fa, a, a has an 
fa, A, a ■ (X,y) 
and there exists an 


efficient F p 2 - endomorphism of degree 2 p defined by 

_ C2,a(s) p jf ^(~ 1 C2,a(s) p \\ 

v 2 xP - 4 ’ 7-2 V 2 (xP-4) 2 )) ’ 

integer r satisfying 2 r 2 = 2 p+ e p tv(fa,A, s ) such that 


fa, A, a = - (^£ 2 ,A,s + £pP) and fa, A, a = ■ 

7 The choice of A is (theoretically) irrelevant, since all quadratic extensions of F p are 

isomorphic. If A and A' are two nonsquares in F p , then A/ A' = a 2 for some a in F p , 

so £2 ,A,t and £ 2 , A', at are identical. We are therefore free to choose any practically 
convenient value for A, such as one permitting faster arithmetic in F v (\fA). 
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The twisted endomorphism ip' % , A s on £'2 ,a,s satisfies ip2,A,s = a — e p p) 

and {ip2 t A,s) 2 = {— e p 2 ] 7 Tf' ^ s . The characteristic polynomial ofip2,A,s and ip^ A >s 
is P2 ,a,’ s (T) =T 2 - e p rT + 2p. 

Proof. Reduce £'2,a,s and (p2,A, s mod p and compose with ttq as in fjTH then apply 
Proposition [T] using Eq. g} . □ 

If Q C is a cyclic subgroup of order N such that ip2,A,s{G) = G, 

then the eigenvalue of ip2,A,s on Q is 

A 2 ,a,s = ~ (1 + e P p) = ±-\/^ (mod N ) ■ 

Applying Proposition^ we can decompose scalar multiplications in Q as [ m]P = 
[a ] P + [b]ip2,A, s (P) where a and b have at most [log 2 p] bits. 

Proposition 3 . Theorem^ yields at least p — 3 non-isomorphic curves over F p 2 
(and at least 2p — 6 non-¥ p 2 -isomorphic curves, if we count the quadratic twists) 
equipped with efficient endomorphisms. 

Proof. It suffices to show that the j-invariant j (£2 ,a,s) = ( 1 2 _ ) 3 ( 1 +^/A) takes 
at least p— 3 distinct values in F p 2 as s ranges over F p . If j(£2,A, Sl ) = 3(^2, A, S2 ) 
with si ^ s 2 , then si and s 2 satisfy Po(si,S 2 ) — 2\fiAF\{s\, s 2 ) = 0, where 
-Fi(si,s 2 ) = (si +s 2 )(63Asis 2 - 65) and Fo(si,s 2 ) = (Asis 2 + 1)(81Asis 2 - 
175)+49Z\(si + s 2 ) 2 are polynomials over F p . If si and s 2 are in F p , then we must 
have Fo(si,S 2) = Pi(si, s 2 ) = 0. Solving the simultaneous equations, discarding 
the solutions that can never be in F p , and dividing by two (since (si,s 2 ) and 
(s 2 ,si) represent the same collision) yields at most 3 collisions j(£2,A. Sl ) = 
j(£ 2 ,A, S2 ) with si ^ s 2 in F p . □ 

We observe that a £ 2 ,A, s = £2,A,-s, so we do not gain any more isomorphism 
classes in Proposition [3] by including the codomain curves. 


6 Endomorphisms from Quadratic Q-curves of Degree 3 


Let Z\ be a squarefree discriminant; Hasegawa defines a one-parameter family of 
elliptic curves over Q(\/A) by 

£s,a, s :y 2 = x 3 - 3(5 + 4sVA)x + 2(2 s 2 A + 14sVZ + ll) , (5) 

where s is a free parameter taking values in Q. As for the curves in JOB the curve 
£3 ,a, s has good reduction at every inert p > 3 for every s in Q. 

The curve £ 3 , a, s has a subgroup of order 3 defined by the polynomial x — 3, 
consisting of 0 and (3, ±2(1 — s\fA)). Exactly as in fj5j taking the Velu quotient 
and twisting by l/\/— 3 yields an explicit 3-isogeny <p3,A,s '■ £s,a,s —■ ► rJ £3,A,s\ its 
sign function is 



+1 if p = 2 (mod 3) , 
— 1 if p=\ (mod 3) . 


(6) 
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Theorem 2. Let p > 3 be a prime, and define e p as in Eq. Let A be a 
nonsquar H in ¥ p , so F p 2 = ¥ p (y/A). For each s in F p; let 

C 3A {s) == 2(1 + sVA) 

and let £ 3 ,a,s he the elliptic curve over F p 2 defined by 

£ 3> a, s :y 2 = x 3 - 3(2C 3 , a (s) + l)x + ( C 3 ,a(s f + IOC's, - 2) . 

Then £ 3 ,a,s has an efficient F p 2 -endomorphism ip 3> A, s of degree 3 p, mapping 
( x,y ) to 

_ 4 C 3 ,a(s)p _ 4 C 3 , A (s) 2p yP f- 1 4 C 3 ,a(s) p 8C 3 , A (s) 2p \\ 

V 3 xp — 3 3 (xp — 3) 2 ’ ydf 3 {xp — 3) 2 3 (xp — 3 ) 3 J J 

and there exists an integer r satisfying 3 r 2 = 2p + e p tx{£ 3 ^A,s) such that 

tp 3 ,A, 8 — [e P 3]7rg 3 ,2 1, S and ip3,A,s = ~ (tt + e p p) . 

The twisted endomorphism ifj' 3< A,s on ^ 3 ,a, s satisfies (^3 ,a, s ) 2 = [ — e p 3]ns^ A s 
and if) 3t A,s = ( — n£ ' a., + e pP)/ r - Both ip 3 ,A, s and iji 3 A ,s have characteristic 
polynomial P 3 ,a, s {T) = T 2 — e p rT + 3 p. 

Proof. Reduce £ 3 ,a,s and 4> 3 .a,s mod p. compose with no as in and apply 
Proposition [T] using Eq. © . □ 

Proposition 4. Theorem^ yields at least p— 8 non-isomorphic curves over F p 2 
(and counting quadratic twists, at least 2 p — 16 non-¥ p 2 -isomorphic curves) 
equipped with efficient endomorphisms. 

Proof The proof is exactly as for Proposition [31 □ 

7 Cryptographic-Sized Curves 

We will now exhibit some curves with cryptographic parameter sizes, and se- 
cure and twist-secure group orders. We computed the curve orders below using 
Magma’s implementation of the Schoof-Elkies- Atkin algorithm [2511914] . 

First consider the degree-2 curves of ij5j By definition, £i,a,s and its quadratic 
twist £2 As have points of order 2 over F p 2 : they generate the kernels of our 
endomorphisms. If p = 2 (mod 3), then 2 r 2 =2 p + e p tr(£) implies tr(£) ^ 
0 (mod 3), so when p = 2 (mod 3) either p 2 — tr(£) + 1 = #£2,As(®V 2 ) or 
p 2 +tr(£ ) + 1 = #£' 2 A s (F p 2 ) is divisible by 3. However, when p= 1 (mod 3) we 
can hope to find curves of order twice a prime whose twist also has order twice 
a prime. 


As in Theorem [T] the particular value of A is theoretically irrelevant. 
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Example 1. Let p = 2 80 — 93 and A = 2. For s = 4556, we find a twist-secure 
curve: #£ 2 , 2 , 4556 = 2JV and #£ 2 , 2 , 4556 ( F p 2 ) = 2N ' where 

N = 730750818665451459101729015265709251634505119843 and 
N' = 730750818665451459101730957248125446994932083047 

are 159-bit primes. Proposition^ lets us replace 160-bit scalar multiplications in 
£ 2 , 2 , 4556 (Fp 2 ) and £ 2 , 2 , 4556 (-^p 2 ) with 80-bit multiexponentiations. 

Now consider the degree-3 curves of JJU The order of £ 3 ,/i, s (F p 2 ) is always 
divisible by 3: the kernel of # 3 , 2 i,s is generated by the rational point (3, C 3 ,^(s)). 
However, on the quadratic twist, the nontrivial points in the kernel of ip' 3 A s are 
not defined over F p 2 (they are conjugates), so S' i A s (F p a ) can have prime order. 

Example 2. Let p = 2 127 — 1; then A = — 1 is a nonsquare in F p . The parameter 
value s = 122912611041315220011572494331480107107 yields 

#£ 3 ,-i, s (F p2 ) =3 N and #££, _!,,(!» = N' , 

where IV is a 253-bit prime and N' is a 254-bit prime. Using Proposition [2j 
any scalar multiplication in £ 3 i _i iS (F p 2 ) or £3 _ l s (F p2 ) can be computed via a 
127-bit multiexponentiation. 

Example 3. Let p = 2 255 — 19; then A = — 2 is a nonsquare in F p . Taking 

s - 52960937784593362700485649923279446947410945689208862015782690291692803003486 

yields #£ 3 ) _ 2 , s (F p 2 ) = 3 ■ N and #£ 3 i _ 2 , s (F p 2 ) = N', where N and N' are 509- 
and 510-bit primes, respectively. Proposition [5] transforms any 510-bit scalar 
multiplication in £ 3i _ 2 , s (F p 2 ) or £ 2 .- 2 , s (Fp 2 ) into a 255-bit multiexponentiation. 

8 Alternative Models: Montgomery, Twisted Edwards, 
and Doche-Icart-Kohel 

Montgomery models. The curve £ 2 , 21,8 has a Montgomery model over F p 2 if and 
only if 2C2,a(s) is a square in F p 2 (by [2U Proposition 1]): in that case, setting 

b 2,a{s) ■■= \JzC2,a(s) and A 2 ,a(s) = 12/H 2 ,/i(s) , 

the birational mapping (x,y) i-l- (X/Z,Y/Z) = ((a; - A)/B 2 ,a{s), y/B 2: A(s) 2 ) 
takes us from £ 2 ,a,s to the projective Montgomery model 

£^a,s--B2,a(s)Y 2 Z = X(X 2 + A 2! a(s)XZ + Z 2 ) . (7) 

(If 2C 2t A (s) is not a square, then s is F p2 -isomorphic to the quadratic twist 

£2 As-) These models offer a particularly efficient arithmetic, where we use only 
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the X and Z coordinates [2D]- The endomorphism is defined (on the X and Z 
coordinates) by 

ifa, a, s : (X : Z) (X 2p + A 2 i a(s) p X p Z p + Z 2p : -2 B 2A {sf~ p X p Z p ) . 

Twisted Edwards models. Every Montgomery model corresponds to a twisted 
Edwards model (and vice versa) [2ll6] . Let 

a 2 (s) = (A 2 ,a(s) + 2 )/B 2}A (s) and d 2 (s) = (^ 2 ,zi(s) - 2)/B 2 ,a(s) ; 

then with u = X/Z and v = Y/Z, the birational maps 

, (x 1 ,X 2 )^( U ,v)=( 

\V U + 1J \l-x 2 Xi{l — X 2 )J 

take us between the Montgomery model of Eq. 0 and the twisted Edwards 
model 

: a 2 (s)x\ + x\ = 1 + d 2 {s)x\x 2 . 

Doche-Icart-Kohel models. Doubling-oriented Doche-Icart-Kohel models of el- 
liptic curves are defined by equations of the form 

y 2 = x(x 2 + Dx + 16 D) . 

These curves have a rational 2-isogeny <t> with kernel ((0,0)), and (f> and its dual 
isogeny (j>* are both in a special form that allows us to double more quickly by 
using the decomposition [2] = $<j> (see [6j §3.1] for details). 

Our curves £ 2 , A, a come equipped with a rational 2-isogeny, so it is natural to 
try putting them in Doche-Icart-Kohel form. The isomorphism 

oc.{x,y) 1 — > (u,v) = (n 2 (x + A),fj, 3 y) with y = &^&/C 2t A(s) 

takes us from £ 2 , a, a into a doubling-oriented Doche-Icart-Kohel model 

£ 2 “ K s -v 2 = u(u 2 + D 2> a(s)u + 16D 2 ,a(s)) , 

where D 2> a(s) = 2 7 / (l+s>/5). While £ 2 1 a s is defined over F p 2 , the isomorphism 
is only defined over F p 2 (\/l + sy/A); so if 1 + syfA is not a square in F p 2 then 
£ 2 ^As i s IV -isomorphic to £' 2 Ag - The endomorphism ip 2 ^A,s := m h,A,sO~ l is 
Fp-isomorphic to the Doche-Icart-Kohel isogeny (they have the same kernel). 

Similarly, we can exploit the rational 3-isogeny on £ 3 ,a,s for Doche-Icart- 
Kohel tripling (see [51 §3.2]). Let a 3 ^{s) = §/C 3 ,a{s) and b 3i A{s) = a 3i ^(s) -1 / 2 ; 
then the isomorphism (x,y) i->- (u,v) = (a 3t A{s)(x/3 — 1 ), & 3 ,/\(s) 3 y) takes us 
from £ 3 , a, a to the tripling-oriented Doche-Icart-Kohel model 


'VV : v 2 = u3 + 3a 3 , A (s)(u + l) 2 . 
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9 Degree One: GLS as a Degenerate Case 

Returning to the framework ofJH suppose £ is a curve defined over Q and 
base-extended to Q(y/D): then £ = a £, and we can apply the construction of $3] 
taking <p : £ — > a £ to be the identity map. Reducing modulo an inert prime p. 
the endomorphism ip is nothing but ttq (which is an endomorphism, since £ is 
a subfield curve). We have ip 2 = ttq = ire, so the eigenvalue of ip is ±1 on 
cryptographic subgroups of £(F p 2 ). Clearly, this endomorphism is of no use to 
us for scalar decompositions. 

However, looking at the quadratic twist £', the twisted endomorphism ip' 
satisfies (ip') 2 = —t^e 1 ; the eigenvalue of ip' on cryptographic subgroups is a 
square root of —1. We have recovered the Galbraith-Lin-Scott endomorphism 
(cf. m Theorem 2]). 

More generally, suppose cp : £ — > a £ is a Q-isomorphism: that is, an isogeny 
of degree 1. If £ does not have CM, then a <p = e P (p~ x , so ip 2 = [e v ]it£ with 
e p = ±1. This situation is isomorphic to GLS. In fact, £ = a £ implies j(£) = 
j{ a £) = a j(£); so j(£) is in Q, and £ is isomorphic to (or a quadratic twist of) 
a curve defined over Q. We note that in the case d = 1, we have r = ±to in 
Proposition [1] where to is the trace of 7To, and the basis constructed in the proof 
of Proposition |2] is (up to sign) the same as the basis of [TTJ Lemma 3] . 

While £’(¥ p 2 ) may have prime order, £(F p 2 ) cannot: the points fixed by 7To 
form a subgroup of order p+l— to, where t 2 — 2p = tr(£) (the complementary sub- 
group, where 7To has eigenvalue —1, has order p + l+ 1 0 ) . We see that the largest 
prime divisor of #£(F p a) can be no larger than O(p). If we are in a position to 
apply the Fouque-Lercier-Real-Valette fault attack [S] — for example, if Mont- 
gomery ladders are used for scalar multiplication and multiexponentiation — then 
we can solve DLP instances in £' (F p 2 ) in 0(p 1 / 2 ) group operations (in the worst 
case!). While 0(p 1 / 2 ) is still exponentially difficult, it falls far short of the ideal 
0(p ) for general curves over F p 2 . GLS curves should therefore be avoided where 
the fault attack can be put into practice. 

10 CM Specializations 

By definition^ Q-curves do not have CM. However, some exceptional fibres of 
the families £i,a,s and £'i,A. s do have CM. There are only finitely many such 
curves over any given Q(\/z\); following Quer ([23l §5] and [211 §6]), we give 
an exhaustive list of the corresponding parameter values in Tables [T] and [2] In 
each table, if A is a squarefree discriminant and there exists^.s in Q such that 
1 /{s 2 A — 1) takes the first value in a column, then the curve £d,A,s/Q{>fA) bas 
CM by the quadratic order of discriminant D specified by the second value. 

Suppose we have chosen d, A, and s such that £d,A, s is a CM-curve. If the dis- 
criminant of the associated CM order is small, then we can compute an explicit 
endomorphism of £d,A, s of small degree, which then yields an efficient endomor- 
phism p (say) on the reduction £d,A,s modulo p (as in the GLV construction). If 
p is inert, then we also have the degree-dp endomorphism tp constructed above. 
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Table 1. CM specializations of £2 ,a,s (cf. Quer [23] §5]) 


1/(s 2 A-1) 

4 

-9 

48 

-81 

324 

-2401 

-9801 

25920 

777924 

-96059601 

D 

-20 

-24 

-36 

-40 

-52 

-72 

-88 

-100 

-148 

-232 


Table 2. CM specializations of £3 , a, a (cf. Quer [2U §6]) 


1 /(s 2 A - 1) 

1/4 

-2 

-27/2 

16 

-125/4 

80 

1024 

3024 

250000 

D 

-15 

-24 

-48 

-51 

-60 

-75 

-123 

-147 

-267 


Combinations of p and ip may be used for four-dimensional scalar decomposi- 
tions; for example, the endomorphisms [1], p, ip, pip can be used as a basis for the 
4-dimensional decomposition techniques elaborated by Longa and Sica in [18] . 

In fact, reducing these CM fibres modulo a well-chosen p turns out to form a 
simple alternative construction for some of the curves investigated by Guillevic 
and Ionica in [H]: the twisted curve s coincides with the curve E\ tC of [HI 
§2] when c = s\/~A, while Sz,A,a is the curve f? 2 , c of [14] §2] when c = —2s\[A. 
The almost-prime-order 254-bit curve of [HI Example 1] corresponds to the 
reduction modulo p of a twist of one of the curves in the column of Table Q] 
with 1 / (s' 2 A — 1) = 4. This curve has an efficient CM endomorphism (a square 
root of [—5]) as well as an endomorphism of degree 2 p- these endomorphisms are 
combined to compute short 4-dimensional scalar decompositions. 

From the point of view of scalar multiplication, using CM fibres of these 
families allows us to pass from 2-dimensional to 4-dimensional scalar decompo- 
sitions, with a consequent speedup. However, in restricting to CM fibres we also 
re-impose the chief drawback of GLV on ourselves: that is, as explained in the 
introduction, we cannot hope to find secure (and twist-secure) curves over ¥ p 2 
when p is fixed. In practice, this means that the 4-dimensional scalar decom- 
position speedup comes at the cost of suboptimal field arithmetic; we pay for 
shorter loop lengths with comparatively slower group operations. 

We must therefore make a choice between 4-dimensional decompositions and 
fast underlying field arithmetic. In this article we have chosen the latter option, 
so we will not treat CM curves in depth here (we refer the reader to m instead). 

11 Higher Degrees 

We conclude with some brief remarks on Q-curves of other degrees. Hasegawa 
provides a universal curve for d = 7 (and any A) in [T5] Theorem 2.2], and our 
results for d = 2 and d = 3 carry over to d = 7 in an identical fashion, though 
the endomorphism is slightly less efficient in this case (its defining polynomials 
are sextic). 
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For d = 5, Hasegawa notes that it is impossible to give a universal Q-curve for 
every discriminant A: there exists a quadratic Q-curve of degree 5 over Q(\/2) 
if and only if (5/ft) = 1 for every prime p t ^ 5 dividing A [15l Proposition 
2.3]. But this is no problem when reducing modulo p, if we are prepared to give 
up total freedom in choosing A: we can take A = —11 for p =1 (mod 4) and 
A = — 1 for p = 3 (mod 4), and then use the curves defined in [T3] Table 6]. The 
generic curves here do not have rational torsion points; it is therefore possible 
for the reductions and their twists to have prime order. 

Composite degree Q-curves (such as d = 6 and 10) promise more interesting 
results. Degrees greater than 10 yield less efficient endomorphisms, and so are 
less interesting from a practical point of view. 
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Abstract. The Gallant-Lambert-Vanstone (GLV) algorithm uses effi- 
ciently computable endomorphisms to accelerate the computation of 
scalar multiplication of points on an abelian variety. Freeman and Satoh 
proposed for cryptographic use two families of genus 2 curves defined over 
F p which have the property that the corresponding Jacobians are (2, 2)- 
isogenous over an extension field to a product of elliptic curves defined 
over F p 2 . We exploit the relationship between the endomorphism rings 
of isogenous abelian varieties to exhibit efficiently computable endomor- 
phisms on both the genus 2 Jacobian and the elliptic curve. This leads 
to a four-dimensional GLV method on Freeman and Satoh’s Jacobians 
and on two new families of elliptic curves defined over F p 2 . 

Keywords: GLV method, elliptic curves, genus 2 curves, isogenies. 

1 Introduction 

The scalar multiplication of a point on a small dimension abelian variety is one of 
the most important operations used in curve-based cryptography. Various tech- 
niques were introduced to speed-up the scalar multiplication. Firstly there exist 
exponent-recoding techniques such as sliding window and Non-Adjacent-Form 
representation [7]. These techniques are valid for generic groups and improved 
for elliptic curves as the inversion (or negation in additive notation) is free. 

Secondly, in 2001, Gallant, Lambert and Vanstone m introduced a method 
which uses endomorphisms on the elliptic curve to decompose the scalar multi- 
plication in a 2-dimensional multi-multiplication. Given an elliptic curve E over 
a finite field F p with a fast endomorphism </> and a point P of large prime order 
r such that <j>(P ) = [A ]P, the computation of [k]P is decomposed as 

[/c]P = [fcr]P + [fc 2 ]0(P), 

with k = k\ + Afc 2 (mod r) such that |fei|, |fc 2 | — \fr. Gallant et al. provided 
examples of curves whose endomorphism <f> is given by complex-multiplication 
by (j -invariant j = 1728), ~ 1+ /^ (j = 0), (j = 8000) and ^1 
( j = —3375). In 2009 Galbraith, Lin and Scott (TO] presented a method to con- 
struct an efficient endomorphism on elliptic curves E defined over F p 2 which are 
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quadratic twists of elliptic curves defined over F p . In this case, a fast endomor- 
phism ip is obtained by carefully exploiting the Frobenius endomorphism. This 
endomorphism verifies the equation ip 2 + 1 = 0 when restricted to points defined 
over F p 2 . In 2012, Longa and Sica improved the GLS construction, by showing 
that a 4-dimensional decomposition of scalar multiplication is possible, on GLS 
curves allowing efficient complex multiplication <f>. Let A, fi denote the eigenval- 
ues of the two endomorphisms (p,ip. Then we can decompose the scalar k into 
k = ko + kiX + k 2 y + k 2 \yL and compute 

[k]P = [k 0 ]P + [A* }cP(P) + [k 2 ]i>(P) + [ks]</> o t(P). 

Moreover, Longa and Sica provided an efficient algorithm to compute decompo- 
sitions of k such that \k t \ < Cr 1//4 , i = 1 , . . . , 4. Note that most curves presented 
in the literature have particular j-invariants. GLV curves have j- invariant 0, 
1728, 8000, or —3375, while GLS curves have j-invariant in F p , even though they 
are defined over F p 2 . 

In 2013, Bos, Costello, Hisil and Lauter proposed in [3] a 4-dimensional GLV 
technique to speed-up scalar multiplication in genus 2. They considered the 
Buhler-Koblitz genus 2 curves y 2 = x 5 +b and the Furukawa-Kawazoe-Takahashi 
curves y 2 = x 5 + ax. These two curves have a very efficient dimension-4 GLV 
technique available. 

In this paper we study GLV decompositions on two types of abelian varieties: 

— Elliptic curves defined over F p 2 , with j-invariant defined over F p 2 . 

— Jacobians of genus 2 curves defined over F p , which are isogenous over an 
extension field to a product of elliptic curves defined over F p 2 . 

First, we study a family of elliptic curves whose equation is of the form 
£?i jC (F p a) : y 2 = x 3 + 27(10 — 3c):r + 14 — 9c with c e F p 2 \ F p , c 2 e F p . These 
curves have an endomorphism $ satisfying ± 2 = 0 for points defined over 
F p 2 . Nevertheless, the complex multiplication discriminant of the curve is not 
2, but of the form — D = —2D . The second family is given by elliptic curves 
with equation of the form E 2tC (¥ p 2 ) : y 2 = x 3 + 3(2c — b)x + c 2 + 14c + 22 with 
c e F P 2 \ F p , c 2 £ F p . We show that these curves have an endomorphism <P such 
that <£ 2 + 3 = 0 for points defined over F p 2 . The complex multiplication discrim- 
inant of the curve E 2>c is of the form —D = —3D . Our construction is a simple 
and efficient way to exploit the existence of a p-power Frobenius endomorphism 
on the Weil restriction of these curves. If the discriminant D is small, we propose 
a 4-dimensional GLV algorithm for the E\, c and E 2jC families of curves. We use 
Velu’s formulas to compute explicitly the endomorphisms on E 1<c and E 2]C . 

At last, we study genus 2 curves whose equations are C\ : Y 2 = X 5 +aX 3 +bX 
and C 2 : Y 2 = X 6 + aX 3 + b, with a, b e F p . The Jacobians of these curves split 
over an extension field in two isogenous elliptic curves. More precisely, the Ja- 
cobian of C 1 is isogenous to E 1:C x E-^ c and the Jacobian of C 2 is isogenous to 
E 2 . c X E 2 - c . These two Jacobians were proposed for use in cryptography by 
Satoh [T5] and Freeman and Satoh [S] , who showed that they are isogenous over 
F p to the Weil restriction of a curve of the form Ei )C or E 2yC . This property is 
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exploited to derive fast point counting algorithms and pairing-friendly construc- 
tions. We investigate efficient scalar multiplication via the GLV technique on 
Satoh and Freeman’s Jacobians. We give explicit formulae for the (2, 2)-isogeny 
between the product of elliptic curves and the Jacobian of the genus 2 curve. 
As a consequence, we derive a method to efficiently compute endomorphisms on 
the Jacobians of C\ and C 2 . 

This paper is organized as follows. In Section |2] we review the construction of 
(2, 2)-isogenies between Jacobians of C\ and C 2 and products of elliptic curves. In 
Section |3] and 0] we give our construction of efficient endomorphisms on E- i c and 
.E 2 ,c and derive a four-dimensional GLV algorithm on these curves. Section [5] 
explains how to obtain a four-dimensional GLV method on the Jacobians of 
Ci and C 2 . Finally, in Section El our operation count at the 128 bit security 
level is proof that both elliptic curves defined over IB^ and Satoh and Freeman’s 
Jacobians yield scalar multiplication algorithms competitive with those of Longa 
and Sica and Bos et al. 

2 Elliptic Curves with a Genus 2 Cover 

In this paper we will work with two examples of genus 2 curves whose Jacobians 
allow over an extension field a (2, 2)-isogeny to a product of elliptic curves. We 
first study the genus 2 curve 

Ci(F p ) : Y 2 = X 5 + aX 3 + bX, with a, b ± 0 e F p . (1) 

It was shown [15118191 §2, §3, §4.1] that the Jacobian of C\ is isogenous to E ljC x 
-Ei,c, where 

E hc {¥ p [Vb}) :y 2 = (e+ 2)x 3 - (3c - 10)a; 2 + (3c - 10)a; - (c + 2) (2) 

with c = a/Vb. We recall the formulae for the cover maps from C\ to E-y c . The 
reader is referred to the proof of Prop. 4.1 in [9] for details of the computations. 


<Pi ■ Ci(F p ) -> E ltC (F p [Vb}) <p 2 : C,( F,) -4 E hc (¥ p [^b}) 



where i = V— I € F p or F p 2 . The (2, 2)-isogeny is given by 

I : Jci -t E\ t c X Et >e 

P + Q-2P ao ^ (^*(P) + puiQ), ^2 *(P) + V2*{Q)) 
and its dual is 


I : E 1>c X E ljC -4 J Cl 

(5i,5 2 )^^(5i)+^(S 2 )-4P 0 
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with (p*(Si) = 
and y >2 (S 2 ) = [ 




(W^P 


Note that / and its dual are defined over an extension field of F p of degree 
1, 2, 4 or 8. One may easily check that I o I = [2] and Jo/ = [2]. Since / 
splits multiplication by 2, an argument similar to pn Prop. 21] implies that 
2End(J Cl ) C End(£ liC x E 1>c ) and 2End(Ei, c x E hc ) C End(J Cl ). We will use 
these inclusions to exhibit efficiently computable endomorphisms on both Jc x 
and Ei tC . 

Secondly, we consider an analogous family of degree 6 curves. These curves 
were studied by Duursma and Kiyavash [8] and by Gaudry and Schost m- 


C 2 ( F p ) :Y 2 =X 6 + aX 3 + b with a, b ^ 0 e F p . (5) 


The Jacobian of the curve denoted Jc 2 is isogenous to the product of elliptic 
curves /? 2 ,c x E 2l - C , where 


E 2jC {¥ p [Vb]) :y 2 = (c + 2)x 3 + (-3c + 30)z 2 + (3c + 30)z + (-c + 2) (6) 
£ 2) _ c (F p [\/6]) : y 2 = (-c + 2)x 3 + (3c + 30)a; 2 + (-3c + 30)a; + (c + 2), (7) 

with c = a/\/b. The construction of the isogeny is similar to the one for I. We 
recall the formulae for cover maps from C 2 to E 2 , c and to E 2 - c - For detailed 
computations, the reader is referred to Freeman and Satoh [HI Prop. 4], 

n ■ C 2 (F p ) -a £ 2 , c x £ 2 ,_ c (F p [^]) 

(x,y).-> » ( x-^5 )3 ) > ((frll) > (x+W)} (8) 

Note that the isogeny constructed using these cover maps is defined over an 
extension field of degree 1,2,3 or 6. 

3 Four-Dimensional GLV on E ljC 

In this section, we construct two endomorphisms which may be used to compute 
scalar multiplication on Ei c using a 4-dimensional GLV algorithm. We assume 
that c e F p 2 \ F p and c 2 e F p . 

3.1 First Endomorphism on E 1>c with Velu’s Formulas 

We aim to compute a 2-isogeny on Ei iC (F p 2 ). First we reduce the equation J5]) 
of Ei jC to 

£ liC (Fp 2 ) : y 2 = x 3 + 27(3 c - 10)a; - 108(9c - 14) (9) 

through the change of variables (x, y) (3(c + 2)x — (3c — 10), (c + 2 )y). Note 
that we can write 

£u, c (Fp 2 ) : y 2 = (x — 12) (x 2 + 12x + 81c - 126). 


(10) 
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Hence there always exists a 2-torsion point P 2 = (12,0) on i?i, c (F J) 2 ). We apply 
Velu’s formulas [20IHI14] to compute the isogeny whose kernel is generated by 
P ‘2 . We obtain an isogeny from E- l c into E b : y 1 2 = x? + b±x + be with 64 = 
— 2 2 ■ 27(3c + 10), be = — 2 2 • 108(14 + 9c). We observe that E b is isomorphic to 
the curve whose equation is 

E 1 _ ( ,(F p2 ) : y 2 = x 3 + 27(— 3c— 10)a; + 108(14 + 9c) (11) 

through (xb,Vb) i-> (xb/{—2),yb/(—2\/^2)). Note that %/^2 G F p 2 and thus this 
isomorphism is defined over . We define the isogeny 


: E hc (V p2 ) - 
(x,y) i- 


Sl,_ c (F p2 


(12) 


We show that we can use this isogeny to get an efficiently computable endo- 
morphism on E-y c . Observe that since c G F p 2 \ ¥ p and c 2 G F p , we have that 


7T P (C) = = -C, Mm,c)) = (13) 

hence the curves Ei iC and are isogenous over F p 2 via the Frobenius map 

7Tp . They are not isomorphic, because they do not have the same j -invariant. 

To stun up, by composing tt p oT 2 , we obtain an efficiently computable endo- 
morphism $2 as follows: 


$ 2 : £i,c(Fp 2 ) -4 £i, c (Fp2) 


162 - 81c -yP 

2 (^ 12 )’ 


162- 81c \\ 
{XP-12Y)) 


_ f x 2p - 12a?P + 162 - 81c v x 2p - 2Ax p - 18 + 81c \ 

V —2{x p — 12) ' V — 2-y 7 — 2^ {x p — 12) 2 / ' 

If we compute formalljQ <P 2 then we obtain exactly the formulas to compute 
7tp2 o [—2] on F?i iC (Fp2) if \f^2 G F p or 7r p 2 o [2] if \f^2 0 F p . This difference 
occurs because a term y / —2y / —2 P appears in the formula. If p = 1,3 mod 8, 
^/^2 ^, = \f—2 and if p = 5, 7 mod 8, \f^2 P = — \/— 2. Hence ( P 2 restricted to 
points defined over F p 2 verifies the equation 

<^±2 = 0. (14) 


We note that the above construction does not come as a surprise. Since 
2End( Jcj C End(Fi, c x Ey tC ) and since the Jacobian Jc 1 is equipped with 
a p-power Frobenius endomorphism, we deduce that there are endomorphisms 
with inseparability degree p on the elliptic curve Ei >c . Our construction is simply 
an explicit method to compute such an endomorphism. 

1 E.g. Verification code with Maple can be found at the address 

http : / /www . di . ens . f r/~ ionica/Verif icat ionMaple- Isogeny- 2p-El . maple 
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Two-Dimensional GLV. By using Id and $ 2 , we get a two-dimensional GLV 
algorithm on the curve Ei c . Smith P! constructs families of 2-dimensional 
GLV curves by reducing mod p Q- curves defined over quadratic number fields. 
Q-curves are curves without complex multiplication with isogenies towards all 
their Galois conjugates. Since we are interested into designing a fast higher 
dimensional algorithm, we will study curves with small complex multiplication 
discriminant. In this purpose, our curves are constructed using the complex 
multiplication method. For a discussion on the advantages of using dimension 2 
curves, see [15] . 


3.2 Efficient Complex Multiplication on Ei )C (F p 2 ) 

We suppose that the complex multiplication discriminant D of the curve £i jC 
is small. A natural way to obtain an efficiently computable endomorphism is 
to take <Pr: the generator for the endomorphism ring (i.e. y/—D). Guillevic and 
Vergnaud [131 proof of Th. 1 (4.) §2.2] showed that D = 2D' , for some integer 
D' . Let t p 2 be the trace of Ei |C (F p 2 ). The equation of the complex multiplication 
is then 

(V) 2 -V = -2DV, (15) 

for some 7 g Z. We prove that there is an endomorphism on E\ c whose degree of 
separability is D' . In order to do that, we will need to compute first the general 
equation of f i> 2 . 

Lemma 1. There are integers m and n such that ifp =1,3 (mod 8), then 

t p 2 +2 p= D m, 2 and t p 2 —2 p= —2 n 2 . (16) 

and if p= 5,7 (mod 8), then 

t p 2 +2 p= 2 n 2 and t p 2 — 2 p= —D m 2 . (17) 

Moreover, the characteristic equation of $2 is 

d>\ - 2n$ 2 + 2pld = 0 . (18) 

Proof. We have that Tr($|) — Tr 2 (<£ 2 ) + 2 deg(^ 2 ) = 0. We know that deg ($2) = 
2 p because d> 2 = n p oX 2 and deg(7r p ) = p, deg(T 2 ) = 2, so Tr 2 (<P 2 ) = Tr(^|) +4 p. 
Now, if p = 1, 3 mod 8, Tr($ 2 ) = Tr(7r p 2 o [—2]) = — 2 t p 2 and we get Tr 2 (<£ 2 ) = 
— 2 t p 2 +4 p = — 2 (t p 2 — 2 p). We may thus write t p 2 —2 p= —2 n 2 , for some integer 
n. If p = 5, 7 mod 8, Tr(^|) = Tr(7r p 2 o [2]) = 2 t p 2 and we get TV 2 (0 2 ) = 2 t p 2 + 
4 p = 2(t p 2 + 2 p). Hence t p 2 +2 p = 2 n 2 again. Using the complex multiplication 
equation (H31) . we have that there is an integer m such that t p 2 +2 p = D'm 2 , 
if p = 1,3 (mod 8) and t 2 2 —2 p = —D'm 2 , if p = 5, 7 (mod 8). Using these 
notations, the characteristic equation of <? 2 is 

$ 2 - 2n d> 2 + 2p Id = 0 . 
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Theorem 1. Let Ei jC be an elliptic curve given by equation (HU . defined over 
F p 2 . Let —D be the complex multiplication discriminant and consider D' such 
that D = 2D' . There is an endomorphism d>D' of Ei jC with degree of separability 
D' . The characteristic equation of this endomorphism is 

—Dm <& D > + D p Id = 0 . (19) 

Proof. Since D = 2D' . we have that i>n is the composition of a horizontal isogeny 
of degree 2 with a horizontal isogeny of degree D' . We denote by I2 : -&l iC — > 
E\- c the isogeny given by equation (IT21 ). Note that I 2 is a horizontal isogeny 
of degree 2. Indeed, since n p : E\- c — ► E- liC , it follows that (End(f?i. c ))2 — 
(End(Ei i _ c ))2. Since 2| D, there is a unique horizontal isogeny of degree 2 starting 
from Ei >c . Hence the complex multiplication endomorphism on Ei jC is <Po = 
Id ' 0X2, with Id> : Ei- C ->• E\ tC a horizontal isogeny of degree D' . We define 
d?D' = Id' oTTp, with n' p : E 1;C — ► E\ t - C . To compute the characteristic polynomial 
of L> n ' , we observe that 

$D> 0 $2 = &D 0 TTp 2 • 

Hence, by using equation ([T51) . we obtain that ( Ld> seen as algebraic integer in 
Z [y/—D] is ~ D m± ^- 2D ' . Hence we have ( I )2 n , — D m $ D > + D p Id = 0. 

The endomorphism <Liy constructed in Theorem [T] is thus computed as the 
composition of a horizontal isogeny with the p-power of the Frobenius. Since 
computing the p-power Frobenius for extension fields of degree 2 costs one nega- 
tion, we conclude that f L>n' may be computed with Velu’s formulae with half the 
operations needed to compute 4>d over F p 2 . 


Four-Dimensional GLV Algorithm. Assume that E\ >c is such that #f?i ]C (F p 2) 
is divisible by a large prime of cryptographic size. Let T = L>n' and <I> = $ 2 - 
We observe $ and T viewed as algebraic integers generate disjoint quadratic 
extensions of Q. Consequently, one may use 1, L>. T, <1>T to compute the scalar 
multiple [k]P of a point P £ Ei iC (¥ p 2 ) using a four-dimensional GLV algorithm. 
We do not give here the details of the algorithm which computes decompositions 

k = k\ -|- AJ2A -(- k^p k^Xpi) 

with A and p the eigenvalues of $ and Tf and |fc,;| < 6V 1 / 4 . Such an algorithm is 
obtained by working over Z[$, &], using a similar analysis to the one proposed 
by Longa and Sica [TH] . 


Eigenvalue Computation. From equation (fT4l) . we deduce that the eigenvalue 
of $2 is P\[— 2 if p = 1.3 mod 8 and p\/2 if p = 5, 7 mod 8. We explain how to 
compute this eigenvalue mod #Ei >c (F p 2 ). We will use the formulas (JfH) and ([Tj). 


An isogeny I : E — >• E' of degree i is called horizontal if (End(F))^ ~ (End(F'))^. 
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If p = 1,3 mod 8, we obtain 

#E 1 , c (F p2 ) = {p+l) 2 -D'm 2 -A VW={p + l)/m 

= (p — l) 2 + 2n 2 -A V=2=(p-l)/n, 

= (1 — t p i/2) 2 + 2D' {nm/2) 2 ->■ yJ—2D' = (2 — t p 2 )/(nm ) . 

If p = 5, 7 mod 8, we obtain 

#£ 1)C (F p2 ) = (p-l) 2 + .D'm 2 — > V—W = (p — l)/m 

= (p + l) 2 — 2n 2 — > \/2 = (p+ l)/n, 

= (1 - V /2) 2 + 2D' {nm/2) 2 -+ = (2 - i p2 )/(nm) . 

The eigenvalue of #2 on E liC (F p2 ) is pV^2 = p(p — l)/n mod #.Ei iC (F p2 ) if 
p=l,3 mod 8 or p-\/2 = p(p + l)/n mod #.Ei )C (F p2 ) if p = 5, 7 mod 8. 

The eigenvalue of <P D > on £d iC (F p2 ) is pVW = p(p+ 1 )/m mod #£li |C (F p2 ) if 
p=l,3 mod 8 or p\J —D' =p(p—l)/m mod #Tli jC (F p2 ) if p = 5, 7 mod 8. 


3.3 Curve Construction and Examples 

We construct curves Ei |C with good cryptographic properties (i.e. a large prime 
divides the number of points of E\ tC over F p2 ) by using the complex multiplication 
algorithm. More precisely, we look for prime numbers p such that the complex 
multiplication equation 


4 p = 2 n 2 + D'm 2 

is verified. Once p is found, we compute the roots of the Hilbert polynomial in 
F p2 to get the j-invariant of the curve j{E\ tC ). We finally get the value of c by 
solving j(Ei tC ) = 2 6 ( c ^ 2 )(e+ 2 ) 2 ' n an d choosing a solution satisfying c 2 € F p . 

We note that for a bunch of discriminants (such as —20, —24, —36 etc.), Hilbert 
polynomial precomputation may be avoided by using parameterizations com- 
puted by Quer m- 

C t :y 2 =x 3 -6{5 + 3Vt)x + 8{7 + 9Vt), (20) 

for some teQ. For instance t = | for D = —20, t = | for D = —24 etc. Once p 
is found, one may directly reduce mod p the curve given by equation[20l Curves 
given by equation (1201) are Q-curves and for these discriminants, we obtain the 
same curves as in US- 

Complex multiplication algorithms may not be avoided in certain crypto- 
graphic frames, such as pairing-friendly constructions. One advantage of the 
construction is that one has the liberty to choose the value r of the large prime 
number dividing the curve group order. This helps in preventing certain attacks, 
such as Cheon’s attack [4] on the c/-DH assumption. On the negative side, we 
cannot construct curves with fixed p (such as the attractive 2 127 — 1). 

Using Magma, we computed an example with p = 5 mod 8, D = 40, D = 20. 
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Example 1. We first search 63-bit numbers n, m such that p = (2 n 2 + 20m 2 )/4 

is prime and is almost prime. We can expect an order of the form 4r, 

with r prime. In a few seconds, we find the following parameters. 

n = 0x55d23edfa6alf7e4 

m = 0x549906b3eca27851 

t p 2 = - 0xfaca844b264dfaa353355300f9ce9d3a 

p = 0x9a2a8c914e2d05c3f2616cade9b911ad 

r = 0xl735ce0c4fbac46c2245c3ce9d8da0244f9059ae9ae4784d6b2f65b29c444309 
c 2 = 0x40b634aec52905949ea0fe36099cb21a 
with r,p prime and #£d, c (F p 2 ) = 4r. 

We use Velu’s formulas to compute a degree-5 isogeny from E% tC into E b ^. We 
find a 5-torsion point P$(X$, Y$) on Ei p .(¥ p s). The function IsogenyFromKernel 
in Magma evaluated at (E ltC (¥ p s), ( X — X Pb )(X — X 2 p 5 )) outputs a curve E b , 5 : 
y b — xl — 25 • 27(3c + 10):cb + 125 • 108(9c + 14). The curve E b is isomorphic 
to E\_ c over F p 2 through i ^ : ( x b ,y b ) i->- {x b /b, y&/(5\/5)). The above function 
outputs also the desired isogeny with coefficients in F p 2 : 

: 

E hc (¥ p 2) -> £; 6 , 5 (F P 2 ) 


-2 3 .3 4 ((9c+16)x 2 +|11(27c+64)x+|3 3 (53c+80) 
” l (x 2 +fh:*-||G-|-1623 2 


y 


-2 4 .3 4 ((9e+16)x 3 +§ll(27c+64)x 2 + g3 4 (53c+80)x+^3 2 (4419c+13360)) 
(x 2 + - 2 -cx- ts c+162) 3 


, 2-3 3 ( f (13c+40)x 2 +2 3 (27c+28)x+2f (369c+1768)) 
+ (z 2 + ¥cx-fic+162)2 

We finally obtain a second computable endomorphism c 
by composing tt p o o I 5 . 
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4 Four-Dimensional GLV on E^cO^p 2 ) 

The construction of two efficiently computable endomorphisms on E 2jC , with 
degree of inseparability p, is similar to the one we gave for E- lc . 

We consider the elliptic curve given by eq. © in the reduced form: 

^ 2 ,c(F p 2 ) : y 2 = x 3 + 3(2c — 5)x + c 2 — 14c + 22 . (22) 

We assume that c G F p 2 \ F p , c 2 G F p , c is not a cube in F p 2 . In this case the 
isogeny ([5]) between Jc 2 and E 2 , c X E 2 - c is defined over F p c . The 3-torsion 
subgroup -E2,c(F p 2)[3] contains the order 3 subgroup {O. (3, c + 2), (3, — c — 2)}. 
We compute an isogeny whose kernel is this 3-torsion subgroup. With Velu’s 
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formulas we obtain the curve Eb : y 2 = x 3 — 27(2c+5)x — 27 (c 2 + 14c + 22). The 
curve Eb is isomorphic to E 2 - c ■ (F p a) : y 2 = x 3 — 3(2 c + 5)x + c 2 + 14c + 22, 
via the isomorphism (x,y) i->- {x/{— 3), y/{— 3>/^3)) . We define the isogeny 


I 3 ■ E 2 , c — >• E 2 _ c 

(x>y) ^ (=£ ( x 


Finally, we observe that 7r p (c) = — c and n p (j(E 2tC )) = j(E 2 _ c ). This implies 
that E 2tC and F 2; _ c are isogenous through the Frobenius map n p . We obtain the 
isogeny <f>3 = X3 o n p which is given by the following formula 


We compute formally L> 2 and obtain L> 2 = n p % o [±3] . There is a term ?—3\/—3 p 
in the y-side of $ 2 . We observe that if p = 1 mod 3 , then = 1, \/—3\/—3 P = 
—3 and = 7r p 2 o [ — 3] . Similarly, if p = 2 mod 3, then = Tiya o [3]. We con- 
clude that for points defined over F p 2 , we have 

<^±3 = 0. 


Guillevic and Vergnaud (13l Theorem 2] showed that the complex multiplica- 
tion discriminant is of the form 3D' . With the same arguments as for Ei c , we 
deduce that there are integers m and n such that if p = 1 (mod 3), then 


t p 2 + 2p = D m 2 and t p 2 — 2 p= —2 n 2 . 


and if p = 2 (mod 3) , then 

t p 2 +2p= 2 n 2 and t p 2 — 2p = —D nr? . 


As a consequence, we have the following theorem, whose proof is similar to the 
proof of HJ 

Theorem 2. Let E 2 c be an elliptic curve given by equation (1^1) . defined over 
F p 2. Let —D be the complex multiplication discriminant and consider D’ such 
that D = 3D'. There is an endomorphism L>d' of E 2jC with degree of separability 
D' . The characteristic equation of this endomorphism is 

& D , -Dm + Dp Id = 0 . (23) 

We have thus proven that = <? 3 and T = <?£>', viewed as algebraic integers, 
generate different quadratic extensions of Q. As a consequence, we obtain a 
four-dimensional GLV algorithm on E 2>c . 
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5 Four-Dimensional GLV on J Cl and J C2 

The first endomorphism & on Jc 1 is induced by the curve automorphism (x, y) 
(—x, iy), with i a square root of -1. The characteristic polynomial is X 2 + 1 = 1. 
On Jc 2 we consider T the endomorphism induced by the curve automorphism 
(x, y) — > {Qi'x. y). Its characteristic equation is X 2 + X + 1. The second endo- 
morphism is constructed as $ = I($d' ,$d')I, where <l>iy is the elliptic curve 
endomorphism constructed in Theorem [TJ In order to compute the character- 
istic equation for <P. we follow the lines of the proof of Theorem 1 in [TJ]]. We 
reproduce the computation for the Jacobian of C\. 

Theorem 3. Let C\ : y 2 = x 5 + ax 3 + b be a hyperelliptic curve defined over 
F p with ordinary Jacobian and let r a prime number such that rHJe^Ip). Let 
I : Jc x E hc x E\ tC the ( 2,2)-isogeny defined by equation (@J and assume I is 
defined over an extension field of degree k > 1. We define <P = I{L>d’ x $d')I- 
where <Pd' is the endomorphism defined in Theorem [TJ Then 

1. For P e Jci[r](Fp), we have <P{P) = [A]P, with AeZ. 

2. The characteristic equation of <L is ( P 2 — 2 D m ( P + ADp Id = 0. 

Proof. 1. Note that End(,/c 1 ) is commutative, and <P is defined over F p (see [2] 
Prop. III.1.3]). Hence, for T> 6 we have that i t(<P(D)) = t(D)) = 

<P(T>). Since there is only one subgroup of order r in Jc, (F p ), we obtain that 

<£(£>) = XV. 

2. Since II = [2] then 

$ 2 = I{$ D . X 0 D , )lf ($ D , x <P D , )I = 2I($ 2 d , , )I. (24) 

Since <Pd' verifies the equation 

<P 2 d , - D m + Dp Id = 0, (25) 

we have 

[2 ]f{(4> 2 D ,,& D ,)-D'm (<P D ,,<I> D ,) + D'p (Id, Id ))I = Oj Cl 
Using equation (1M1) . we conclude that <P' 2 — 2 D m ( I> + AD'p Id = 0. 

5.1 Computing I on <Jci(®p) 

We show first how to compute stately the (2, 2)-isogeny on Jc 1 (F p ) with only a 
small number of operations over extension fields of F p . 

Let V be a divisor in Jq 1 (F p ) given by its Mumford coordinates 

V = [U, V] = [T 2 + u{T + u 0 , vi T + u 0 ], u 0 , ui,v 0 , «i eF p . 

It corresponds to two points Pi(Xi, Yi), P 2 {X 2 , Y 2 ) G Ci(F p ) or C\ (F p 2 ) . We have 

^ ^ la-1'1 X 1 Y 2 -X 2 Y 1 

Ui = -(ai + x 2 ),u 0 = X 1 X 2 ,v 1 = — — ,u 0 = — — — • 

A2 — Ai A i — A 2 
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Explicit formula to compute tpu(Pi) + </?i*(-P2)- Let y>i*(Pi) = (*i,i,?/i.i) and 
l Pi*{P2) = (*2,1, 2/2,1) In the following we give the formulas to compute 
<51 (*3,1, 2/3,1) = <Pu(PJ + Vu(P2). 


*3,1 = 3^ ~ ( Xl > 


_ 2 [(«o«i-«i«o)«i-i'o«o] + [3(«o«i-wi«o)] v / 6+ [3uo] \/6+ [ui] \A> 3 


We denote Ai = A-yj \/b. The computation of the numerator of A\ costs 4 M p and 
the denominator costs S p + M p . We will use the Jacobian coordinates for Si: 
2:3,1 = Xs^i/Z^ i, 1/3,1 = Y^\jZ\ x to avoid inversion in F p 4. We continue with 


_ o ( [“0 + 6 ] + [m?-6«o] Vb) ( [«o+6]+ [— 2uo] Vb) 
1_ ([ U ,_ 6 ] + [„ 0 U 1 ]^ + [_ U 1 ]V 5 )^ 


As Mq was already computed in Ay, this costs one square {u\) and a multiplication 
in F p 2, hence S p + M p 2. The denominator is the same as the one of Af, that is, 


Then 


* 3,1 


v+(c+2) 

VbAi 

( a+ 2 Vb ) 


-(* 1,1 + * 2 ,l) + ^ 

-(*l,l+* 2 ,l) + 3 “; 2 ^- 


To avoid tedious computations, it is preferable to precompute both l/(a + 2 i/b) 
and ( 3 a — 10 Vb)/(a + 2 Vb) with one inversion in F p 2 and one multiplication in 
F p 2. 

Computing \fbAf is done by shifting to the right coefficients and costs one 
multiplication by b (as Af € F p 4 ). Then \/bA\ ■ (a + 2 yfb )~ 1 costs 2 M p 2 . Finally 
we need to compute • Z\ which costs S p 4 + 2 M p 2 . The total cost of A 3 ; i, 

^3,1 and Z‘f | is QM P + 2 S p + 5 M p 2 + S p 4. 

Computing 1/3,1 is quite complicated because we deal with divisors so we do 
not have directly the coefficients of the two points. We use this trick: 


2/3,i = Ai (0:1,1 - *3,1) - 2/1,1 
2/3,i = Ai (2:2,1 - *3,1) - 2/2,1 
22/3,1 = Ai (2:1,1 + *2,1 - 22:3,1) - (2/1,1 + 2/2,1) 


Since 2q,i + 2:2,1 was already computed for 2:3,1, getting (2:1,1 + 2:2,1 — 22:3,1) 
costs only additions. We multiply the numerators of Ai and (2:1,1 +*2,1 — 22:3,1) 
which costs lM p 4 . The denominator is Zv , and as Z\ x is already computed, 
this costs lM p i. The numerator of (2/1,1 +2/2,1) contains products of uo, ui, vq, v\ 
previously computed and its denominator is simply Z\. The total cost of 2/3,1 is 
then 2M p i. Finally, computing (*3,1, 2/3,1) costs 
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Now we show that computing S2 (2:3,2, 2/3,2) is free of cost. We notice that 
<p 1 (X j ,Y j ) = <p 2 (-X j ,iY j ) 


with i such that i 2 = — 1 and j G { 1 , 2 }. Rewriting this equation in terms of 
divisors, we derive that 


S2 (2:3, 2, 2/3,2) = • 

We can simply compute S2 with ipu : 


2:3,2 = 2;3,i([-ui,u 0 ,-toi, w 0 ]) with 
A2 = \i([-ui,u 0 ,-ivi,iv 0 ]) 

_ 2 i (vou-L-vxuoKu^^-vouo+Zy/bvQ-Wvi 

~\/b ( uo—Vb)(uo - yfbui+Vb) 


7 r p 2 (Ai) 


and 

(2:1,1 + X2,i){[-u u uo,-iv 1 ,iv 0 \) = 2 = 7 tp 2 ( 2:1,1 + 2:2,1) • 

We deduce that 2:3,2 = V 2 ( x 3,i), 2/3,2 = ^(2/3,1) and 

<P 2 *(V) = V 2 *{Pl) + <P 2 *i P 2 ) = n p 2 (921* (-Pi) + Vl*{ p 2 )) ■ 

Computing (213,2,2/3,2) costs two Frobenius 7r p 2 which are performed with four 
negations on F p 2 . 

5.2 Computing Endomorphisms on E 1)C 

Here we apply the endomorphism <P D i on Si (2:3,1 , 2/3,1). As ( P D > is defined over 
F P 2, it commutes with 7r p 2 hence ^£>'(2:3,2) = 7r p 2 (<& D > (2:3,1)) is free. Unfortu- 
nately Si has coefficients in F p 4 hence we need to perform some multiplications 
in F p 4. More precisely, 1/3,1 is of the form Vby ' 31 with 2/3 1 G F p 4. As the endo- 
morphism is of the form <P D >(x,y) = ( c P n i X (x), yP T y t . y ( x )) the s/by'zy term is 
not involved in the endomorphism computation. 


5.3 Computing I on Jc 1 (F p ). 

Then we go back to ,Jc, ■ We compute the divisor of these two points (with 
Sy/xSJ) on Jc L and get 

^(2:3,1,2/34) =T 2 — 2 Vb^T + Vb, (g^r- <fi) • 

If (2:3,1, 2/3,1) is in Jacobian coordinates (X3-1 , I31 , ^3,1) then we compute = 

X3,l+Kl 

X 3 , 1 —Z'i i i ‘ 
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A similar computation gives 

¥>2 (*3,2, 2/3,2) = T 2 +2</b^^T + Vb, (j^T+</b) . 

Since £3,2 = Kjp(xa t i) and 2/3,2 = 7 iy (2/3,1), we have 

<P% (* 3 , 2 , 2/3,2) = T 2 + 2 </b l^ T + Vb, fc 1 S)-? r +^) ■ 

Hence (*3,2, 2/3,2) = 7r p 2 (^i(*3,i,?/3,i))- 
Finally, we have 

+ </>2*( P 2)) = + ¥>1*(-P2)))) ■ 

and, with similar arguments, 


^d'MPi) + ^(P2))) = OMGMA) + M^))))) • 

The computation of the sum </?* (#£>' (<^i* (D))) + 7 r p 2 o (£>*(#£,/ |^i*(P))) in- 
volves terms in F p 4 but thanks to its special form, we need to perform the opera- 
tions in F p 2 only. We give the table of computations in Appendix |X] and show that 
most multiplications are performed over F p 2 . We have followed computations for 
a multiplication in Mumford coordinates provided in [ 5 ], 

We conclude that applying (pu(Pi) + ipu {P2) costs roughly as much as an 
addition on J, c x over F p , <p2*(Pi) + ¥2* (P2) is cost free. Computing <Po' depends 
on the size of D' and costs few multiplications over F p 4 . Finally adding ip\ + ip % 
costs roughly an addition of divisors over F p 2 . 


6 Complexity Analysis and Comparison to GLS-GLV 
Curves 

We explain that our construction is valid for GLS curves with discriminants 
-3 and - 4 . These curves are particularly interesting for cryptography, because 
their simple equation forms result into simple and efficient point additions. A 
four-dimensional GLV algorithm on these curves was proposed by Longa and 
Sica JTB]. Although the endomorphisms we construct do not allow to derive 
a higher dimension algorithm, they offer an alternative to Longa and Sica’s 
construction. 


The Case D = — 4. We consider a curve with CM discriminant D = — 4 , 
defined over F p 2, with p = 1 mod 8 . Assume that the curve is of the form 
E a ( F p 2) : y 2 = x 3 + ax with a 6 F p 2 . A 2 -torsion point is Pg{Q, 0 ). Using Velu’s 
formulas, we get the isogeny with kernel generated by P2, whose equation is 
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This isogeny sends points on E a on the curve Ef, : y 2 = x 3 — 4ax. We use the 
same trick as previously. If a £ F p 2 is such that n p (a) = oP = —a (this is the 
case for example if a = %/a with a £ F p a non-square) then by composing with 
(x b ,y b ) H. (</(-2), ^/(-2V=2)) , we get an endomorphism 0 2 . Note that 
\/—l £ F p since p = 1 mod 8. We obtain 


0 2 : E a { F p2 ) -»■ E a (¥ p 2 


(%> y ) >- 


w)) 


if (x,y) = (0,0), 
otherwise. 


We obtained an endomorphism 0 2 such that 0 2 — 2 = 0, when restricted to 
points defined over F p 2 . The complex multiplication endomorphism 0 on E a is 
( x , y ) — > (— x , iy) and verifies the equation 0 2 + 1 = 0. The 4-dimensional GLV 
algorithm of Longa and Sica on this curve uses an endomorphism 0 such that 
0 4 + 1 = 0. With our method we obtain two distinct endomorphisms, but the 
three ones 0, 0 2 , 0 are not “independent” on the subgroup T(Fp 2 )\_E[ 2 ], Indeed, 
we have 0 2 + 00 2 = 20. 

Note that in this case the corresponding Jacobian splits into two isogenous 
elliptic curves over F p , namely the two quartic twists defined over F p of £i )C . 


The Case D = — 3. We consider the curve Ep whose Weierstrass equation is 

y 2 = x 3 + p, (26) 

where /3 2 £ F p . Our construction yields the following efficiently computable en- 
domorphism 


&3(x,y) 


x 2 p ) 



When restricted to points defined over F p 2 , this endomorphism verifies the equa- 
tion 0 2 — 3 = 0, while the complex multiplication endomorphism 0 has charac- 
teristic equation 0 2 + 0 + 1 = 0. Longa and Sica’s algorithm uses the complex 
multiplication 0 and an endomorphism 0 verifying 0 2 + 1 = 0 for points defined 
over Fp 2 . We observe that 20 3 0 — 1 = 20. 

We give in Table |6] the operation count of a computation of one scalar multi- 
plication using two-dimensional and four-dimensional GLV on E and Ep given by 
equation (1251 . We denote by m,s and by M, S the cost of multiplication and squar- 
ing over F p and over F p 2 , respectively. We denote by c the cost of multiplication by 
a constant in F p 2 . In order to give global estimates, we will assume that m ~ s and 
that M ~ 3 to and S ~ 3s. Additions in F p are not completely negligible compared 
to multiplications, but we do not count additions here. We counted operations by 
using formulae from Bernstein and Lange’s database [T] for addition and doubling 
in projective coordinates. On the curve E- l c addition costs 12M + 2 S, while dou- 
bling costs 5.9 + 6M + lc. For Ep , addition costs 12M + 2,9, while doubling is 
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3 M +5S'+lc. Note that by using Montgomery’s simultaneous inversion method, we 
could also obtain all points in the look-up table in affine coordinates and use mixed 
additions for the addition step of the scalar multiplication algorithm. This variant 
adds one inversion and 3(n— 1) multiplications, where n is the length of the look-up 
table. We believe this is interesting for implementations of cryptographic applica- 
tions which need to perform several scalar multiplications. For genus 2 arithmetic 
on curves of the form y 2 = x 5 + ax 3 + bx, we used formulae given by Costello and 
Lauter [5] in projective coordinates. An addition costs 43M + 4 S and a doubling 
costs 30M + 9 S. 


Table 1 . Total cost of scalar multiplication at a 128-bit security level 


| Curve | Method [Operation count | Global estimation! 


Ei, e 

4-GLV, 16 pts. 

1168M + 4405 

4797m 

Ef> 

4-GLV, 16 pts. 

976 M + 4405 

4248m 

Bxa 

2- GLV, 4 pts. 

2048 M + 8325 

8640m 

E p 

2-GLV, 4 pts. 

1664M + 8325 

7488m 

J Cl 

4-GLV, 16 pts. 

4500m + 816s 

5316m 

J<h 

2-GLV, 4 pts. 

7968m + 1536s 

9504m 

FKT [3j 

4-GLV, 16 pts. 

4500m + 816s 

5316m 

Kummer [3] 


3328m + 2304s 

5632m 


The practical gain of the 4-dimensional GLV on E\ >c , when compared to the 
2-dimensional GLV method, is of 44%. Curves with discriminant -3, defined over 
F p 2 , which belong both to the family of curves we propose and to the one proposed 
by Longa and Sica, offer a 12% speed-up, thanks to their efficient arithmetic. 

7 Conclusion 

We have studied two families of elliptic curves defined over F p 2 which have the 
property that the Weil restriction is isogenous over F p to the Jacobian of a 
genus 2 curve. We have proposed a four dimensional GLV algorithm on these 
families of elliptic curves and on the corresponding Jacobians of genus 2 curves. 
Our complexity estimates show that these abelian varieties offer efficient scalar 
multiplication, competitive to GLV algorithms on other families in the literature, 
having two efficiently computable and “independent” endomorphisms. 
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A Appendix 1 

Following [5], we explain here the step addition of two divisors in the isogeny 
computation in Section 15.31 We denote by m n and s n the cost of multiplication 
and squaring, respectively, in an extension field F p n . 

CTl = Ul + 7T p 2(ui), A 0 =Vq- 7Tp2 (vo), A\ = V\ — 7T p 2(ui), C7"i = u\ (1 m 4 ) 

Mi =u\ — 7 T P 2 (u?) ,M 2 = Vb(ir p2 {ui) - m), M 3 = u x - 7 r p 2 (ui); 

I 2 = 2(M 2 • A\ + Ao ■ Mi ); l 3 = Ao ■ M 3 ; d = —2 M 2 • M 3 ] (4m 2 ) 

A = l/(d ■ Z3); B = d ■ A, C = d ■ B; D = I 2 ■ B; (3m 2 +lm 4 ) 

E = l%- A-CC = C 2 ; u'{ = 2 • D - CC - a 1 (lm 2 +2s 2 ) 

u'q = D 2 + C ■ (ur + 7r p 2 (vi)) - (( u > - CC)- Ul + {Ul + 7T P 2 {Ui)))/2 (2to 2 +1s 4 ) 

Of = 7 Tp»(t ij) • Uq'; v" = D ■ (ui — u") +u 2 - u'q - Ui\ (2m 4 +ls 4 ) 

v'o = D - (u 0 - u'o) + Uq ]v" = -{E ■ v'l + Ul); v'l = -{E ■ v + u 0 ); (3m 4 ) 
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Abstract. The classic Leftover Hash Lemma (LHL) is often used to 
argue that certain distributions arising from modular subset-sums are 
close to uniform over their finite domain. Though very powerful, the 
applicability of the leftover hash lemma to lattice based cryptography is 
limited for two reasons. First, typically the distributions we care about in 
lattice-based cryptography are discrete Gaussians, not uniform. Second, 
the elements chosen from these discrete Gaussian distributions lie in an 
infinite domain: a lattice rather than a finite field. 

In this work we prove a “lattice world” analog of LHL over infinite 
domains, proving that certain “generalized subset sum” distributions 
are statistically close to well behaved discrete Gaussian distributions, 
even without any modular reduction. Specifically, given many vectors 
{xijYLi from some lattice L c R n , we analyze the probability distribu- 
tion Yl'iLi ZiXi where the integer vector z £ Z m is chosen from a discrete 
Gaussian distribution. We show that when the xds are “random enough” 
and the Gaussian from which the z's are chosen is “wide enough” , then 
the resulting distribution is statistically close to a near-spherical dis- 
crete Gaussian over the lattice L. Beyond being interesting in its own 
right, this “lattice- world” analog of LHL has applications for the new 
construction of multilinear maps [S] , where it is used to sample Discrete 
Gaussians obliviously. Specifically, given encoding of the xds, it is used 
to produce an encoding of a near-spherical Gaussian distribution over 
the lattice. We believe that our new lemma will have other applications, 
and sketch some plausible ones in this work. 

1 Introduction 

The Leftover Hash Lemma (LHL) is a central tool in computer science, stating 
that universal hash functions are good randomness extractors. In a characteristic 
application, the universal hash function may often be instantiated by a simple 
inner product function, where it is used to argue that a random linear combina- 
tion of some elements (that are chosen at random and then fixed “once and for 
all”) is statistically close to the uniform distribution over some finite domain. 
Though extremely useful and powerful in general, the applicability of the left- 
over hash lemma to lattice based cryptography is limited for two reasons. First, 
typically the distributions we care about in lattice-based cryptography are dis- 
crete Gaussians, not uniform. Second, the elements chosen from these discrete 
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Gaussian distributions lie in an infinite domain: a lattice rather than a finite 
field. 

The study of discrete Gaussian distributions underlies much of the advances in 
lattice-based cryptography over the last decade. A discrete Gaussian distribution 
is a distribution over some fixed lattice, in which every lattice point is sampled with 
probability proportional to its probability mass under a standard (n-dimensional) 
Gaussian distribution. Micciancio and Regev have shown in [TU] that these distri- 
butions share many of the nice properties of their continuous counterparts, and 
demonstrated their usefulness for lattice-based cryptography. Since then, discrete 
Gaussian distributions have been used extensively in all aspects of lattice-based 
cryptography (most notably in the famous “Learning with Errors” problem and 
its variants M)- Despite their utility, we still do not understand discrete Gaussian 
distributions as well as we do their continuous counterparts. 

A Gaussian Leftover Hash Lemma for Lattices? 

The LHL has been applied often in lattice-based cryptography, but sometimes 
awkwardly. As an example, in the integer-based fully homomorphic encryption 
scheme of van Dijk et al. [T5] , ciphertexts live in the lattice Z. Roughly speaking, 
the public key of that scheme contains many encryptions of zero, and encryption 
is done by adding the plaintext value to a subset-sum of these encryptions of 
zero. To prove security of this encryption method, van Dijk et al. apply the 
left-over hash lemma in this setting, but with the cost of complicating their 
encryption procedure by reducing the subset-sum of ciphertexts modulo a single 
large ciphertext, so as to bring the scheme back in to the realm of finite rings 
where the leftover hash lemma is naturally applied Q It is natural to ask whether 
that scheme remains secure also without this artificial modular reduction, and 
more generally whether there is a more direct way to apply the LHL in settings 
with infinite rings. 

As another example, in the recent construction of multilinear maps 0, Garg 
et. al. require a procedure to randomize “encodings” to break simple algebraic 
relations that exist between them. One natural way to achieve this randomization 
is by adding many random encodings of zero to the public parameters, and 
adding a random linear combination of these to re-randomize a given encoding 
(without changing the encoded value). However, in their setting, there is no 
way to “reduce” the encodings so that the LHL can be applied. Can they argue 
that the new randomized encoding yields an element from some well behaved 
distribution? 

In this work we prove an analog of the leftover hash lemma over lattices, 
yielding a positive answers to the questions above. We use discrete Gaussian 
distributions as our notion of “well behaved” distributions. Then, for m vectors 
{afi}j e [ m ] chosen “once and for all” from an n dimensional lattice L c R", 
and a coefficient vector z chosen from a discrete Gaussian distribution over the 


1 Once in the realms of finite rings, one can alternatively use the generic proof of 
Rothblum [15], which also uses the LHL. 
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integers, we give sufficient conditions under which the distribution z i x i is 
“well behaved.” 

Oblivious Gaussian Sampler 

Another application of our work is in the construction of an extremely simple 
discrete Gaussian sampler [6113] . Such samplers, that sample from a spherical 
discrete Gaussian distribution over a lattice have been constructed by [5] (using 
an algorithm by Klein H) as well as Peikert m- Here we consider a much sim- 
pler discrete Gaussian sampler (albeit a somewhat imperfect one). Specifically, 
consider the following sampler. In an offline phase, for m> n, the sampler sam- 
ples a set of short vectors x\, x %, . . . , x m from L - e.g., using GPV or Peikert’s 
algorithm. Then, in the online phase, the sampler generates z £ Z m according to 
a discrete Gaussian and simply outputs YhLi z % x i- But does this simpler sam- 
pler work - i.e., can we say anything about its output distribution? Also, how 
small can we make the dimension m of z and how small can we make the entries 
of z? Ideally m would be not much larger than the dimension of the lattice and 
the entries of z have small variance - e.g., 0(s/n). 

A very useful property of such a sampler is that it can be made oblivious to 
an explicit representation of the underlying lattice, which makes it applicable 
easily within an additively homomorphic scheme. Namely, if you are given lattice 
points encrypted under an additively homomorphic encryption scheme, you can 
use them to generate an encrypted well behaved Gaussian on the underlying 
lattice. Previous samplers [6113] are too complicated to use within an additively 
homomorphic encryption schemed. 

Our Results 

In this work, we obtain a discrete Gaussian version of the LHL over infinite 
rings. Formally, consider an n dimensional lattice L and (column) vectors X = 
[aji|sc 2 | • ■ • \ x m] € L. We choose a ij according to a discrete Gaussian distribution 
T>l,s, where T>l,s is defined as 'Dl,s,c{x) = Ps,c( x ) = f exp(— 7r ||® — 

c|| 2 /s 2 ) and ps tC (A) for set A denotes Y x eAPs,c( x )- 

Let z <— Dy/,m s > , we analyze the conditions under which the vector X ■ z is 
statistically close to a “near-spherical” discrete Gaussian. Formally, consider: 

£x, s > = {X-z:z^V zm , s ,} 

Then, we prove that £x, s ' is close to a discrete Gaussian over L of moder- 
ate “width” . Specifically, we show that for large enough s ' , with overwhelming 
probability over the choice of X : 

1. £x , s ' is statistically close to the ellipsoid Gaussian D L s , X T ■> over L. 

2. The singular values of the matrix X are of size roughly Sy/rn, hence the 
shape of ’Dl,s , x t is “roughly spherical” . Moreover, the “width” of 'Dl,s , x t 
is roughly s'sy/m = poly(n). 

2 As noted by Peikert [13], one can generate an ellipsoidal Gaussian distribution over 
the lattice given a basis B by just outputting y <— B ■ z where z is a discrete 
Gaussian, but this ellipsoidal Gaussian distribution would typically be very skewed. 
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We emphasize that it is straightforward to show that the covariance matrix 
of £x,s' is exactly s ,2 XX r . However, the technical challenge lies in showing 
that £x,s' is close to a discrete Gaussian for a non-square X. Also note that 
for a square X, the shape of the covariance matrix XX T will typically be very 
“skewed” (i.e., the least singular value of X T is typically much smaller than the 
largest singular value). We note that the “approximately spherical” nature of the 
output distribution is important for performance reasons in applications such as 
GGH: These applications must choose parameters so that the least singular value 
of X “drowns out” vectors of a certain size, and the resulting vectors that they 
draw from £x,s' grow in size with the largest singular value of X, hence it is 
important that these two values be as close as possible. 

Our Techniques 

Our main result can be argued along the following broad outline. Our first theo- 
rem (Theorem^ says that the distribution of X-z 4- £x, s > is indeed statistically 
close to a discrete Gaussian over L, as long as s' exceeds the smoothing param- 
eter of a certain “orthogonal lattice” related to X (denoted A). Next, Theorem 
[3] clarifies that A will have a small smoothing parameter as long as X T is “reg- 
ularly shaped” in a certain sense. Finally, we argue in Lemma |3] that when the 
columns of X are chosen from a discrete Gaussian, Xi <— T>l,s , then X T is 
“regularly shaped,” i.e. has singular values all close to a n (S)y/m. 

The analysis of the smoothing parameter of the “orthogonal lattice” A is 
particularly challenging and requires careful analysis of a certain “dual lattice” 
related to A. Specifically, we proceed by first embedding A into a full rank lattice 
A q and then move to study M q - the (scaled) dual of A q . Here we obtain a lower 
bound on A n+ i (M q ), i.e. the n + 1 th minima of M q . Next, we use a theorem 
by Banasczcyk to convert the lower bound on A„ + i (M q ) to an upper bound 
on A m - n (A q ), obtaining m — n linearly independent, bounded vectors in A q . 
We argue that these vectors belong to A, thus obtaining an upper bound on 
A m - n (A). Relating A m _ n (A) to r/ e (A) using a lemma by Micciancio and Regev 
completes the analysis. (We note that probabilistic bounds on the minima and 
smoothing parameter A q ,M q are well known in the case when the entries of 
matrix X are uniformly random mod q (e.g. [6]), but here we obtain bounds in 
the case when X has Gaussian entries significantly smaller than q.) 

To argue that X T is regularly shaped, we begin with the literature of random 
matrices which establishes that for a matrix H G R mX ", where each entry of H 
is distributed as Af( 0, s 2 ) and rn is sufficiently greater than n, the singular values 
of H are all of size roughly s-^rn. We extend this result to discrete Gaussians - 
showing that as long as each vector Xi <— where S is “not too small” and 
“not too skewed” , then with high probability the singular values of X T are all 
of size roughly s^/rn. 

Related Work 

Properties of linear combinations of discrete Gaussians have been studied before 
in some cases by Peikert [12 as well as more recently by Boneh and Freeman [3]. 
Peikert ’s “convolution lemma” (Theorem 3.1 in [1.3] ) analyzes certain cases in 
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which a linear combination of discrete Gaussians yields a discrete Gaussian, in 
the one dimensional case. More recently, Boneh and Freeman [3] observed that 
under certain conditions, a linear combination of discrete Gaussians over a lattice 
is also a discrete Gaussian. However, the deviation of the Gaussian needed to 
achieve this are quite large. Related questions were considered by Lyubashevsky 
[5] where he computes the expectation of the inner product of discrete Gaussians. 

Discrete Gaussian samplers have been studied by [B] (who use an algorithm 
by 0) and m- These works describe a discrete Gaussian sampling algorithm 
that takes as input a ‘high quality’ basis B for an n dimensional lattice L and 
output a sample from Df jSC . In [B], s > ||R|| • w(^/log n), and B = max, ||6 t || 
is the Gram Schmidt orthogonalization of B. In contrast, the algorithm of m 
requires s > o\ (B). i.e. the largest singular value of B, but is fully parallelizable. 
Both these samplers take as input an explicit description of a “high quality basis” 
of the relevant lattice, and the quality of their output distribution is related to 
the quality of the input basis. 

Peikert’s sampler [13] is elegant and its complexity is difficult to beat: the only 
online computation is to compute c— Bi\B^ x (c — a^)] , where c is the center of 
the Gaussian, Bi is the sampler’s basis for its lattice L, and x -2 is a vector that 
is generated in an offline phase (freshly for each sampling) in a way designed 
to “cancel” the covariance of B\ so as to induce a purely spherical Gaussian. 
However, since our sampler just directly takes an integer linear combination of 
lattice vectors, and does not require extra precision for handling the inverse Bj” 1 , 
it might outperform Peikert’s in some situations, at least when c = 0. 

2 Preliminaries 

We say that a function / : R+ — > R + is negligible (and write /(A) < negl(A)) if 
for every d we have /(A) < 1/A d for sufficiently large A. For two distributions 
V 1 and T >2 over some set Q the statistical distance SD("Di , V 2 ) is 

SD^,^) X)lg r N-PrW| 

xeo 1 2 

Two distribution ensembles T>i(A) and D 2 (A) are statistically close or statisti- 
cally indistinguishable if SD(2 ?i(A),I> 2(A)) is a negligible function of A. 

2.1 Gaussian Distributions 

For any real s > 0 and vector c £ R", define the (spherical) Gaussian func- 
tion on R" centered at c with parameter s as p s , c (cc) = exp(— 7r||a: — c|| 2 /s 2 ) 
for all x £ R n . The normal distribution with mean p and deviation cr, de- 
noted N(i- 1 , <r 2 ), assigns to each real number x G R the probability density 
f(x) = • P ay / 2 i^{x). The n-dimensional (spherical) continuous Gaussian 

distribution with center c and uniform deviation cr 2 , denoted J\f n (c,a 2 ), just 
chooses each entry of a dimension-n vector independently from Af(ci,a 2 ). 
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The n-dimensional spherical Gaussian function generalizes naturally to el- 
lipsoid Gaussians, where the different coordinates are jointly Gaussian but are 
neither identical nor independent. In this case we replace the single variance 
parameter s 2 6 K by the covariance matrix E G R™ Xn (which must be positive- 
definite and symmetric). To maintain consistency of notations between the spher- 
ical and ellipsoid cases, below we let S' be a matrix such that S T x S = E. Such 
a matrix S always exists for a symmetric E, but it is not unique. (In fact there 
exist such S’es that are not even n-by-n matrices, below we often work with such 
rectangular S’es.) 

For a rank-n matrix S G R mXn and a vector cel", the ellipsoid Gaussian 
function on R" centered at c with parameter S is defined by 

Ps,c{x) = exp ( — 7r(a: — c) T (S T S) -1 (x — c)) Va: G R 71 . 

Obviously this function only depends on E = S T S and not on the particular 
choice of S. It is also clear that the spherical case can be obtained by setting 
S = sl n , with I n the n-by-n identity matrix. Below we use the shorthand p s {) 
(or ps{-)) when the center of the distribution is 0. 


2.2 Matrices and Singular Values 

In this note we often use properties of rectangular (non-square) matrices. For 
m>n and a rank -n matrisQ X' G R mxn , the pseudoinverse of X' is the (unique) 
rn-by-n matrix Y' such that X' Y' = Y' X' = I n and the columns of Y' span 
the same linear space as those of X' . It is easy to see that Y' can be expressed 
as Y' = X'{X' T X')- 1 (note that X' T X' is invertible since X' has rank n) . 

For a rank-n matrix X' G R mxn , denote Ux* = {1 : u G R", ||w|| = 1}. 

The least singular value of X' is then defined as a n (X r ) = mi(U x ) and similarly 
the largest singular value of X' is <j\{X') = swp(U' x ). Some properties of singular 
values that we use later in the text are stated in Fact [TJ 

Fact 1. For rank-n matrices X',Y' G R mx " with m>n, the following holds: 

1. If X' r X' = Y' T Y' then X', Y' have the same singular values. 

2. If Y' is the (pseudo)inverse of X' then the singular values of X',Y' are 
reciprocals. 

3. If X' is a square matrix (i.e., m = n) then X' , X' have the same singular 
values. 

4- If a\ (Y 7 ) < 8a n {X') for some constant 5 < 1, then a\ ( X ' + Y') G [1 — S, 1 + 
<5]<7i(V') and a n {X' + Y') G [1 - 5, 1 + 8]a n {X'). □ 

It is well known that when m is sufficiently larger than n, then the singular values 
of a “random matrix” X' G R mxn are all of size roughly s/m. For example, 
Lemma Q] below is a special case of [3 Thm 3.1], and Lemma [2] can be proved 
along the same lines of (but much simpler than) the proof of [T71 Corollary 2.3.5]. 

3 We use the notation X' instead of X to avoid confusion later in the text where we 
will instantiate X' = X T . 
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Lemma 1. There exists a universal constant G > 1 such that for any m > 
2 n, if the entries of X' £ l mxn are drawn independently from 0, 1) then 
Pr[f7„(X') < v / m/C r ] < exp (—0(m)). □ 

Lemma 2. There exists a universal constant C > 1 such that for any m > 
2 n, if the entries of X' £ R mX " are drawn independently from M (0,1) then 
Pr[CTi(X') > Cy/m\ < exp(— 0(m)). □ 

Corollary 1. There exists a universal constant C > 1 such that for any m > 2n 
and s > 0, if the entries of X' £ R mXn are drawn independently from JV"( 0, s 2 ) 
then 

Pr [sy/m/C < o- n (X') < cr\{X') < sC\/rn\ > 1 — exp(— 0(m)). □ 

Remark. The literature on random matrices is mostly focused on analyzing the 
“hard cases” of more general distributions and m which is very close to n (e.g., 
m = (1 + o{T))n or even m = n). For our purposes, however, we only need the 
“easy case” where all the distributions are Gaussian and n (e.g., m = n 2 ), 
in which case all the proofs are much easier (and the universal constant from 
Corollary Q] gets closer to one). 

2.3 Lattices and Their Dual 

A lattice Lcl" is an additive discrete sub-group of R". We denote by span(L) 
the linear subspace of R", spanned by the points in L. The rank of L C R" is 
the dimension of span(L), and we say that L has full rank if its rank is n. In 
this work we often consider lattices of less than full rank. 

Every (nontrivial) lattice has bases: a basis for a rank-fc lattice L is a set of k 
linearly independent points b \, . . . , bfc 6 L such that L = (X^=i zfbi : Zi £ Z Vi}. 
If we arrange the vectors bi as the columns of a matrix B £ R" x k then we can 
write L = {Bz : z £ Z k }. If B is a basis for L then we say that B spans L. 

Definition 1 (Dual of a Lattice). For a lattice L c R n , its dual lattice 
consists of all the points in span(L) that are orthogonal to L modulo one, namely: 

L* = {y S span(L) : \/x £ L, ( x , y) £ Z} 

Clearly, if L is spanned by the columns of some rank-fc matrix X £ R" xfe then 
L* is spanned by the columns of the pseudoinverse of X. It follows from the 
definition that for two lattices L C M we have M* n span(L) C L* . 

Banasczcyk provided strong transference theorems that relate the size of short 
vectors in L to the size of short vectors in L*. Recall that A, (L) denotes the i-th 
minimum of L (i.e., the smallest s such that L contains i linearly independent 
vectors of size at most s). 

Theorem 1 (Banasczcyk [2] ) . For any rank-n lattice L c R m , and for all 
ie[n), 


1 < A i(L) ■ A„_j + i(L*) < n. 
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2.4 Gaussian Distributions over Lattices 

The ellipsoid discrete Gaussian distribution over lattice L with parameter S, 
centered around c, is 



where ps, c (A) for set A denotes YlseeA Ps,c{x)- In other words, the probability 
T^l,s,c{x) is simply proportional to ps, c (x), the denominator being a normaliza- 
tion factor. The same definitions apply to the spherical case, which is denoted by 
(with lowercase s). As before, when c = 0 we use the shorthand T>l,s 
(or T>l < s ). The following useful fact that follows directly from the definition, 
relates the ellipsoid Gaussian distributions over different lattices: 

Fact 2. Let L c M" be a full-rank lattice, c G R n a vector, and S G 
B G R nxn two rank-n matrices, and denote L' = {B~ 1 v : v G L}, c' = B~ 1 c, 
and S' = Sx (£? T ) -1 . Then the distribution T>l,s,c is identical to the distribution 
induced by drawing a vector v «— 'L>l , ,S',c' and outputting u = Bv. □ 

A useful special case of Fact [5] is when L' is the integer lattice, L' = Z n , in which 
case L is just the lattice spanned by the basis B. In other words, the ellipsoid 
Gaussian distribution on L(B), v G- T> L ^ S c , is induced by drawing an integer 
vector according to z <— V 7/ n S , r > and outputting v = Bz, where S' = S(B~ 1 ) T 
and d = B~ x c. 

Another useful special case is where S = sB T , so S is a square matrix and 
S’ = sl n . In this case the ellipsoid Gaussian distribution v <— Dl,s, c is induced 
by drawing a vector according to the spherical Gaussian u g- L) l , s c i and out- 
putting v = I S T u , where d = s(S' T ) _1 c and L' = {s(S T )~ 1 v : v G L}. 

Smoothing parameter. As in [TO], for lattice L and real e > 0, the smoothing 
parameter of L, denoted r/ e (L), is defined as the smallest s such that pi/ s (L* \ 
{0}) < e. Intuitively, for a small enough e, the number r/ e (L) is sufficiently larger 
than L’s fundamental parallelepiped so that sampling from the corresponding 
Gaussian “wipes out the internal structure” of L. Thus, the sparser the lattice, 
the larger its smoothing parameter. 

It is well known that for a spherical Gaussian with parameter s > rj e (L), the 
size of vectors drawn from T>f j S is bounded by Syjn whp (cf. [TO] Lemma 4.4], 
m Corollary 5.3]). The following lemma (that follows easily from the spherical 
case and Fact [2j) is a generalization to ellipsoid Gaussians. 


Lemma 3. For a rank-n lattice L, vector c G R n , constant 0 < e < 1 and 
matrix S s.t. <J n (S) > r] e (L), we have that for v T>l^ 3 <c , 
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Moreover, for every z £ M" r > 0 it holds that 

V i-T>L s (K u_c ’- 2: )I - r<7 i(‘ S ')ll z ll) < 2en- exp(-7rr 2 ). 

The proof can be found in the long version pQ . 

The next lemma says that the Gaussian distribution with parameter s > r) e (L) 
is so smooth and “spread out” that it covers the approximately the same number 
of L-points regardless of where the Gaussian is centered. This is again well known 
for spherical distributions (cf. [51 Lemma 2.7]) and the generalization to ellipsoid 
distributions is immediate using Fact [21 

Lemma 4. For any rank-n lattice L, real e £ (0, 1), vector c £ R n . and rank-n 
matrix S £ R mx " such that a n (S) > r] e (L), we have ps,c(L ) G , 1] • Ps(L). 

□ 

Regev also proved that drawing a point from L according to a spherical discrete 
Gaussian and adding to it a spherical continuous Gaussian, yields a probability 
distribution close to a continuous Gaussian (independent of the lattice), provided 
that both distributions have parameters sufficiently larger than the smoothing 
parameter of L. 

Lemma 5 (Claim 3.9 of fl4|L Fix any n-dimensional lattice L c K", real e £ 
(0, 1/2), and two reals s, r such that > rj e (L), and denote t = \/r 2 + s 2 . 

Let IZl^s be a distribution induced by choosing x 4— Dl, s from the spherical 
discrete Gaussian on L and y 4— N n (Q,r 2 /2ir) from a continuous Gaussian, 
and outputting z = x + y. Then for any point u £ R", the probability den- 
sity 1ZL, r ,s(u) Is close to the probability density under the spherical continuous 
Gaussian J\f n (0,t 2 /2n) upto a factor of |: 

^ e M n (0,t 2 /2ir)(u) < n LiriS (u) < i±f^”( 0 ,t 2 / 27 r)(w) 

Inparticular, the statistical distance between1ZL, r ,s andJ\f n ( 0, t 2 / 2n) is at mostAe. 

More broadly, Lemma [5] implies that for any event E{u), we have 

Pr [£?(«)] -.feS < Pr [£(«)] < Pr [£?(«)] • i±£ 
u<-Af(0,tV27r) L v n 1+e “ «eRi, r ,» l V n - «^Af(0,tV2^) 1_€ 

Another useful property of “wide” discrete Gaussian distributions is that they 
do not change much by short shifts. Specifically, if we have an arbitrary subset of 
the lattice, T C L, and an arbitrary short vector v £ L. then the probability mass 
of T is not very different than the probability mass oi T — v = {u — v : u £ T}. 
Below let erf(-) denote the Gauss error function. 

Lemma 6. Fix a lattice L c M", a positive real e > 0, and two parameters 
s, c such that c > 2 and s > (1 + c)rj e (L). Then for any subset T c L and any 
additional vector v £ L, it holds thatT>L^(T)—V L ^ s {T—v) < erf ^erf( 2 g) C ^ 2 ^ ' yjff > 
where q = 
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We provide the proof in IA.1I 

One useful special case of Lemma El is when c = 100 (say) and j|n|| « s, where 
we get a bound V Lj 8 (T) - V L , S (T - v) < w 0.81. We note 

that when ^ — > 0, the bound from Lemma El tends to (just over) 1/4, but we 
note that we can make it tend to zero with a different choice of parameters in 
the proof (namely making H' v and H” thicker, e.g. H” = H v and H' v = 2 H v ). 
Lemma El extends easily also to the ellipsoid Gaussian case, using Fact EJ 

Corollary 2. Fix a lattice L C ffi”, a positive real e > 0, a parameter c > 2 and 
a rank-n matrix S such that s = f o n (S) > (1 + c)g e (L). Then for any subset 
T c L and any additional vector v £ L, it holds that T>i,,s(T) — Dl,s(T — 
v ) < 5llML^)/a . i±| ; w here q = \\v\\^k/s. 

Micciancio and Regev give the following bound on the smoothing parameter in 
terms of the primal lattice. 

Lemma 7. [Lemma 3.3 of llOf l For any n-dimensional lattice L and positive 
real e > 0, 


,.(o<ML)-/h (2n(1 + 1/e)) . 

In particular, for any superlog arithmic function w(logn), there exists a negligible 
function e(n) such that r] e (L) < •y / cu(logn) • A„(L). 

3 Our Discrete Gaussian LHL 

Consider a full rank lattice LCZ", some negligible e = e(n), the corresponding 
smoothing parameter g = g e (L) and parameters s > f2(g), m> l?(nlogn), and 
s' > J?(poly(n) log(l/e)). The process that we analyze begins by choosing “once 
and for all” m points in L, drawn independently from a discrete Gaussian with 
parameter s, a <—T>l,s 0 

Once the ajj’s are fixed, we arrange them as the columns of an n-by-m matrix 
X = (aii | * 2 1 • • • \x m ), and consider the distribution Ex, s'-, induced by choosing 
an integer vector v from a discrete spherical Gaussian with parameter s’ and 
outputting y = X ■ v: 

Ex, s' = f {X-v.v^V 7 ^^}. (1) 

Our goal is to prove that Ex, s' is close to the ellipsoid Gaussian T> L s , x t . 
over L. We begin by proving that the singular values of X T are all roughly of 
the size 

4 More generally, we can consider drawing the vectors Xi from an ellipsoid discrete 
Gaussian, Xi <— T>l,s, so long as the least singular value of S is at least s. 

5 Since we eventually apply the following lemmas to X T , we will use X T in the 
statement of the lemmas for consistency at the risk of notational clumsiness. 
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Lemma 8. There exists a universal constant K > 1 such that for all m > 2 n, 
e > 0 and every n-dimensional real lattice L c R n , the following holds: choosing 
the rows of an m-by-n matrix X T independently at random from a spherical 
discrete Gaussian on L with parameter s > 2 Krj e (L), X T <— (T>L tS ) m , we have 

Pr [ sV2-jrm/K<a n (X T ) < cji{X t ) < si4'v / 27rroj > 1— (4me+0(exp(— m/K))). 

The proof can be found in the long version pQ . 

3.1 The Distribution £x,s' Over Z" 

We next move to show that with high probability over the choice of X, the 
distribution £ x , s ' is statistically close to the ellipsoid discrete Gaussian V l>s i X t ■ 
We first prove this for the special case of the integer lattice, L = Z n , and then 
use that special case to prove the same statement for general lattices. In either 
case, we analyze the setting where the columns of X are chosen from an ellipsoid 
Gaussian which is “not too small” and “not too skewed.” 

Parameters. Below n is the security parameters and e = negligible(n). Let S 
be an n-by-n matrix such that a n (S) > 2Krj e (Z n ), and denote si = o\ (S'), 
s n = o’niS), and w = s\/s n . (We consider w to be a measure for the “skewness” 
of S.) Also let to, q , s' be parameters satisfying to > lOnlogg, q > 8m 5 / 2 n 1 / 2 siw, 
and s' > AwmP^n 1 / 2 ln(l/e). An example setting of parameters to keep in mind 
is m = n 2 , s n = yfn (which implies e « 2 _v/ ”), si = n (so w = y/n), q = 8 n 7 , 
and s' =n 5 . 

Theorem 2. For e negligible in n, let S £ M" x ” be a matrix such that s n = 
cr n (S) > 18Kr] e (Z n ), and denote s i = cri(S) and w = s\/s n . Also let m,s' be 
parameters such that m > 10nlog(8TO 5 / 2 n 1//2 siw) and s' > AwmA^n 1 / 2 ln(l/e). 

Then, when choosing the columns of an n-by-m matrix X from the ellipsoid 
Gaussian over Z", X <— ifDz n ,s) m , we have with all but probability 
over the choice of X, that the statistical distance between £ XjS ' and the ellipsoid 
Gaussian T>z n , s 'x T bounded by 2e. 

The rest of this subsection is devoted to proving Theorem^ We begin by showing 
that with overwhelming probability, the columns of X span all of Z n , which 
means also that the support of £ x . s ' includes all of Z n . 

Lemma 9. With parameters as above, when drawing the columns of an n-by-m 
matrix X independently at random from T>^ t s we get X ■ Z m = Z n with all but 
probability 

The proof can be found in the long version [1] . 

Prom now on we assume that the columns of X indeed span all of Z". Now 
let A = A(X) be the (to — n)-dimensional lattice in Z m orthogonal to all the 
rows of X, and for any z £ Z" we denote by A z = A Z (X) the z coset of A: 

A = A(X) d = {v£Z m : X v = 0} and A z = A Z (X) d = {v £ Z m : X v = zj. 
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Since the columns of X span all of Z n then A z is nonempty for every z £ Z n , 
and we have A z = v z + A for any arbitrary point v z £ A z . 

Below we prove that the smoothing parameter of A is small (whp), and use 
that to bound the distance between £ x , s ' and T > z ,^ T . First we show that if 
the smoothing parameter of A is indeed small (i.e. , smaller than the parameter 
s' used to sample the coefficient vector v), then £ x , s ' and T>z n ,s'x T must be 
close. 

Lemma 10. Fix X and A = A(X) as above. If s' > r] e (A), then for any point 
z £ Z n , the probability mass assigned to z by £ x ,s’ differs from that assigned by 
®z»,s'x T by at most a factor of (1 — e)/(l + e), namely 

£x,A*)e [frt^ -WW* 

In particular, if e < 1/3 then the statistical distance between £ x ,s’ and 2? Z n iS / x 
is at most 2e. 

The proof can be found in Appendix IA.2I 


The Smoothing Parameter of A. We now turn our attention to proving 
that A is “smooth enough” . Specifically, for the parameters above we prove that 
with high probability over the choice of X, the smoothing parameter ?7 e (A) is 
bounded below s' = 4u;m 3//2 n 1//2 ln(l/e). 

Recall again that A = A(X) is the rank-(m — n) lattice containing all the 
integer vectors in Z m orthogonal to the rows of X. We extend A to a full- 
rank lattice as follows: First we extend the rows space of A, by throwing in 
also the scaled standard unit vectors qei for the integer parameter q mentioned 
above ( q > 8m 5 / 2 n 1 / 2 siw). That is, we let M q = M q (X) be the full-rank Tri- 
dimensional lattice spanned by the rows of X and the vectors qei, 

Mq: { X z + qy : z £ Z , y £ Z j- = {u £ Z : 3z £ Z q s.t. u = X z (mod q~)f 

(where we identity Z q above with the set [—q/ 2, q/2) fl Z). Next, let A q be the 
dual of M q , scaled up by a factor of q, i.e., 

A q = qM* = {u £ R m : Vm £ M q , (v, u) £ qZ} 

= {v £ R m : Vz £ Zg , y £ Z m , z T X -v + q(v,y) £ qZ} 

It is easy to see that A c A q , since any v £ A is an integer vector (so q(v,y) £ qZ 
for all y £ Z rn ) and orthogonal to the rows of X (so z T X ■ v = 0 for all z £ Z q ). 

Obviously all the rows of X belong to M q , and whp they are linearly inde- 
pendent and relatively short (i.e., of size roughly s\\/m). In Lemma fill below 
we show, however, that whp over the choice of A’s, these are essentially the only 
short vectors in M q . 

Lemma 11. Recall that we choose X asX t— (D Z n tS ) m , andletw = ai(S)/a n (S) 
be a measure of the “skewness” of S. The n + l’st minima of the lattice M q = 
M q (X) is at least qf {dwyjmri), except with negligible probability over the choice of 
X. Namely, Pr x «-(D Z „ s )m[A n+ i(M g ) < q/(kws/nm)\ < 
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Proof. We prove that with high probability over the choice of X, every vector 
in M q which is not in the linear span of the rows of X is of size at least q/ Arm. 

Recall that every vector in M q is of the form X T z + qy for some and 

y e Z m . Let us denote by [v] q the modular reduction of all the entries in v into 
the interval [— q/2, q/ 2), then clearly for every zeZJ 

||[X T zy = mf{\\X T z + qy\\-.yeZ m }. 

Moreover, for every z € Z£, y 6 Z m , if X T z + qy ^ [X T z) q then \\Xz + qy || > 
q/2. Thus it suffices to show that every vector of the form [X T z] q which is not 
in the linear span of the rows of X has size at least q/Anw (whp over the choice 
ofX). 

Fix a particular vector z £ Z” (i.e. an integer vector with entries in [—q/2, q/2)). 
For this fixed vector z, let i max be the index of the largest entry in z (in absolute 
value), and let z max be the value of that entry. Considering the vector v = [X T z\ q 
for a random matrix X whose columns are drawn independently from the distribu- 
tion T>zn t s, each entry of v is the inner product of the fixed vector z with a random 
vector Xi <— T>yn,s, reduced modulo q into the interval [—q/2, +q/2). 

Denoting si = ay (S) and s„ = a n (S), we now have two cases, either 2 
is “small”, i.e., |z max; | < q/{2,s\^/mn) or it is “large”, |z ,r, ax | > q/(2si^/mn). 
By the “moreover” part in Lemma [3] (with r = s/rn), for each cry we have 
\{xi, z}|| < siv^roll^H except with probability bounded below 2~ m . If z is “small” 
then ||z|| < q/(2siy/m) and so we get 

I ( Xi,z ) | < ||z|| • siVm < q/2 

except with probability < 2~ m . Hence except with probability m2~ m all the 
entries of X T z are smaller than q/2 in magnitude, which means that [X T z\ q = 
X T z, and so [X T z\ q belongs to the row space of X. Using the union bound 
again, we get that with all but probability q n ■ m2~ m < m2 -9m / 10 , the vectors 
[X T z\ q for all the “small” z’s belong to the row space of X. 

We next turn to analyzing “large” z’s. Fix one “large” vector z, and for 
that vector define the set of “bad” vectors x £ Z n , i.e. the ones for which 
|[(z, x)] q \ < q/Anw (and the other vectors x £ Z” are “good”). Observe that if 
x is “bad”, then we can get a “good” vector by adding to it the i max ’th standard 
unit vector, scaled up by a factor of y = min ( |"s n ] , |_<y/ 1 2z ma x | J ), since 

\[{z,x + ye iin ^)] q \ = \[{z,x) + yz max ] q \ > y\z max \ - |[(z, x)] q \ > q/Anw. 

(The last two inequalities follow from q/2nw < /x|z max | < q/2 and |[(z, x)] q \ < 
q/(Awy/mn).) Hence the injunction x x + /re Wx maps “bad” cc’es to “good” 
cc’es. Moreover, since the cc’es are chosen according to the wide ellipsoid Gaus- 
sian T>z«,s with cr n (S) = s n > r] e ( Z"), and since the scaled standard unit 
vectors are short, y < s n + 1, then by Lemma [6] the total probability mass 
of the “bad” vectors x differs from the total mass of the “good” vectors x + 
pe,; max by at most 0.81. It follows that when choosing x <— T>z n ,s, we have 
Pra, [|[(z, x)] q \ < q/(Awy/mn)\ < (1 + 0.81)/2 < 0.91. Thus the probability that 
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all the entries of [X T z\ q are smaller than qj (Aw-^/nm) in magnitude is bounded 
by (0.91) m = 2 _014m . Since m > lOnlogr/. we can use the union bound to 
conclude that the probability that there exists some “large” vector for which 
||[A T z] g || < q/ {Awy/mn) is no more than q n ■ 2 _0 - 14m < 2 - °( m ). 

Summing up the two cases, with all but probability 2 _0(m )) over the choice 
of X, there does not exist any vector z £ Z™ for which [X T z\ q is linearly 
independent of the rows of X and yet |[X T z] q | < q/ {Aw^/rnn). 

Corollary 3. With the parameters as above, the smoothing parameter of A = 
A(X) satisfies r] e (A) < s' = Awrr?l 2 rAI 2 ln(l/e), except with probability 

The proof can be found in the long version [I] . 

Putting together Lemma HU1 and Corollary [3] completes the proof of Theorem [5J 

□ 


3.2 The Distribution £x,s' over General Lattices 

Armed with Theorem [2j we turn to prove the same theorem also for general 
lattices. 

Theorem 3. Let L be a full-rank lattice Lcl" and B a matrix whose columns 
form a basis of L. Also let M £ R" xri be a full rank matrix, and denote S = 
M(B r )~ 1 , si = cti(S'), s n = a n (S), and w = sx/s n . Finally let e be negligible 
in n and m,s' be parameters such that m > 10nlog(8m 5/,2 n 1/,2 siu;) and s' > 
Awm^n 1 ' 2 ln(l/e). 

If Sn > then, when choosing the columns of an n-by-m matrix X from 

the ellipsoid Gaussian over L, X <— we have with all but probability 

2 -O(m) over c /j 0 j ce of X, that the statistical distance between £x, s > and the 
ellipsoid Gaussian 'D L s i X t is bounded by 2e. 

This theorem is an immediate corollary of Theorem [2] and Fact [2] The proof 
can be found in the long version pQ . 

4 Applications 

In this section, we discuss the application of our discrete Gaussian LHL in the 
construction of multilinear maps from lattices [5] . This construction is illustrative 
of a “canonical setting” where our lemma should be useful. 

Brief overview of the GGH Construction. To begin, we provide a very high level 
overview of the GGH construction, skipping most details. We refer the reader to 
[S] for a complete description. In [5] , the mapping a — > g a from bilinear maps is 
viewed as a form of “encoding” a i->- Enc(a) that satisfies some properties: 

1. Encoding is easy to compute in the forward direction and hard to invert. 

2. Encoding is additively homomorphic and also one-time multiplicatively ho- 
momorphic (via the pairing). 
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3. Given Enc(a), Enc(b ) it is easy to test whether a = b. 

4. Given encodings, it is hard to test more complicated relations between the 
underlying scalars. For example, BDDH roughly means that given Enc(a), 
Enc(b),Enc(c),Enc(d ) it is hard to test if d = abc. 

In [ 5 ] , the authors construct encodings from ideal lattices that approximately sat- 
isfy (and generalize) the above properties. Skipping most of the details, [ 5 ] roughly 
used a specific (NTRU-like) lattice-based homomorphic encryption scheme, where 
Enc(a) is just an encryption of a. The ability to add and multiply then just follows 
from the homomorphism of the underlying cryptosystem, and GGH described how 
to add to this cryptosystem a “broken secret key” that cannot be used for decryp- 
tion but is good enough for testing if two ciphertexts encrypt the same element. (In 
the terminology from [ 5 ] , this broken key is called the zero-test parameter.) 

In the specific cryptosystem used in the GGH construction, ciphertexts are 
elements in some polynomial ring (represented as vectors in Z"), and addi- 
tive/multiplicative homomorphism is implemented simply by addition and mul- 
tiplication in the ring. A natural way to enable encoding is to publish a sin- 
gle ciphertext that encrypts/encodes 1, yi = Enc( 1). To encode any other 
plaintext element a, we can use the multiplicative homomorphism by setting 
Enc(a) = a - y\ in the ring. However this simple encoding is certainly not hard 
to decode: just dividing by y\ in the ring suffices! For the same reason, it is also 
not hard to determine “complex relations” between encoding. 

Randomizing the encodings. To break these simple algebraic relations, the au- 
thors include in the public parameters also “randomizers” x z (i = 1 , . . . , to) , 
which are just random encryptions/encodings of zero, namely Xi <— Enc( 0). 
Then to re-randomize the encoding u a = a ■ y i, they add to it a “random lin- 
ear combination” of the afys, and (by additive homomorphism) this is another 
encoding of the same element. This approach seems to be thwart the simple 
algebraic decoding from above, but what can be said about the resulting encod- 
ings? Here is where GGH use our results to analyze the probability distribution 
of these re-randomized encodings. 

In a little more detail, an instance of the GGH encoding includes an ideal 
lattice L and a secret ring element z, and an encoding of an element a has the 
form e a /z where e a is a short element that belongs to the same coset of L as the 
“plaintext” a. The afys are therefore ring elements of the form fy/z where the 
fy’s are short vectors in L. Denoting by X the matrix with the as columns 
and by B the matrix with the numerators bi as columns, i.e., X = (a?i | . . . \x m ) 
and B = (6i| . . . b rn ) . Re-randomizing the encoding u a = e a jz is obtained by 
choosing a random coefficient vector r <— (for large enough <r*), and 

setting 



Since all the fy’s are in the lattice L. then obviously e a + Br is in the same coset 
of L as e a itself. Moreover since the b, ’s are short and so are the coefficients 
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of r, then also so is e a + Br. Hence u' is a valid encoding of the same plaintext a 
that was encoded in u a . 

Finally, using our Theorem [3] from this work, GGH can claim that the distri- 
bution of u is nearly independent of the original u a (conditioned on its coset). If 
the bi ’s are chosen from a wide enough spherical distribution, then our Gaussian 
LHL allows them to conclude that Br is close to a wide ellipsoid Gaussian. With 
appropriate choice of a* the “width” of that distribution is much larger than 
the original e a , hence the distribution of e a + Br is nearly independent of e a , 
conditioned on the coset it belongs to. 


5 Discussion 

Unlike the classic LHL, our lattice version of LHL is less than perfect - instead 
of yielding a perfectly spherical Gaussian, it only gives us an approximately 
spherical one, i.e. 'D Lj S , x t • Here approximately spherical means that all the 
singular values of the matrix X T are within a small, constant sized interval. It 
is therefore natural to ask: 1) Can we do better and obtain a perfectly spherical 
Gaussian? 2) Is an approximately spherical Gaussian sufficient for cryptographic 
applications? 

First let us consider whether we can make the Gaussian perfectly spherical. 
Indeed, as the number of lattice vectors m grows larger, we expect the greatest 
and least singular value of the discrete Gaussian matrix X to converge - this 
would imply that as m — > oo, the linear combination Y^hLi z i x i does indeed 
behave like a spherical Gaussian. While we do not prove this, we refer the reader 
to [16[ for intuitive evidence. However, the focus of this work is small m (e.g., 
m = 0(n )) suitable for applications, in which case we do not know how to prove 
the same. 

This leads to the second question: is approximately spherical good enough? 
This depends on the application. We have already seen that it is sufficient for 
GGH encodings [5] , where a canonical, wide-enough, but non-spherical Gaussian 
is used to “drown out” an initial encoding, and send it to a canonical distribu- 
tion of encodings that encode the same value. Our LHL shows that one can 
sample from such a canonical approximate Gaussian distribution without using 
the initial Gaussian samples “wastefully” . 

On the other hand, we caution the reader that if the application requires the 
basis vectors aq, . . . , x m to be kept secret (such as when the basis is a trapdoor), 
then one must carefully consider whether our Gaussian sampler can be used 
safely. This is because, as demonstrated by m and [1], lattice applications 
where the basis is desired to be secret can be broken completely even if partial 
information about the basis is leaked. In an application where the trapdoor is 
available explicitly and oblivious sampling is not needed, it is safer to use the 
samplers of [6] or [13] to sample a perfectly spherical Gaussian that is statistically 
independent of the trapdoor. 
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A More Proofs 

A.l Proof of Lemma [6] 

Proof. Clearly for any fixed v, the set that maximizes Df j S (T) — T>r^ s (T — v) 
is the set of all vectors u £ L for which T>f^ s (u) > Df j S (u — v). which we 
denote by T v d = {u £ L : D L:S (u) > V L)S (u — u)}. Observe that for any u £ L 
we have T>l, s {u) > "Dp., (it — v) iff p s (u) > p s (u — v), which is equivalent to 
||ix|| < |u — v|| . That is, u must lie in the half-space whose projection on v is 
less than half of v, namely ( u,v ) < ||u|| 2 /2. In other words we have 

T v = {u £ L : (u,v) < ||u|| 2 /2}, 

which also means that T v — v = {u £ L : (u,v) < — ||n|| 2 /2} C T v . We can 
therefore express the difference in probability mass as ‘DL, s (T v ) — T> LtS (T v — v) = 
'Dl, s (T v \ (T v — v)). Below we denote this set-difference by 

H. d =' = {«£ L :<*»>«(-*£, !£]}. 

That is, H v is the “slice” in space of width ||v|| in the direction of v, which is 
symmetric around the origin. The arguments above imply that for any set T we 
ha veT> LiS (T)—D L!S (T—v) < D L . S (H V ). The rest oftheproofis devoted to upper- 
bounding the probability mass of that slice, i.e., D L)S (H V ) = Pr ui _v L s [u £ H v ]. 

To this end we consider the slightly thicker slice, say H' v = ( 1 + 1 ) H v , and the 
random variable w , which is obtained by drawing u ■£- T>l,s and adding to it a 
continuous Gaussian variable of “width” s/c. We argue that w is somewhat likely 
to fall outside of the thick slice H’ v , but conditioning on u £ H v we have that 
w is very unlikely to fall outside of H’ v . Putting these two arguments together, 
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we get that u must have significant probability of falling outside H v , thereby 
getting our upper bound. 

In more detail, denoting r = s/c we consider drawing u <— Dj jS and z <— 
AT n (0,r 2 /2n), and setting w = u + z. Denoting t = y/r 2 + s 2 , we have that 
s < t < s(l + i) and rs/t > s/(c + 1) > r] e (L). Thus the conditions of Lemma[5] 
are met, and we get that w is distributed close to a normal random variable 
N n (Q,t 2 /2 -k), upto a factor of at most 

Since the continuous Gaussian distribution is spherical, we can consider ex- 
pressing it in an orthonormal basis with one vector in the direction of v. When 
expressed in this basis, we get the event z £ H' v exactly when the coefficient 
in the direction of v (which is distributed close to the 1-dimensional Gaussian 
Af(0,t 2 /2ir)) exceeds ||u(l + ^)/2|| in magnitude. Hence we have 

Prhn £ H’ ] < Pr [lal < ||vl|] • \ £ 

1 vi ~ 1-11 " J 1 — e 


On the other hand, consider the conditional probability Pr[w £ H’ v \u £ H v \: 
Let H” = -H v , then if it £ H v and z £ H”, then it must be the case that 
w = u + z £ H' v . As before, we can consider the continuous Gaussian on z in 


an orthonormal basis with one vector in the direction of v , and we get 
Pr[u> e K\u £ Hv] > Pr [z € H”\u £ H v ]=Pt[z £ H”] 


Putting the last two bounds together, we get 

erf (' ll«llv^( 1 + c) ') . € H ' v ] > Pr[u £ H v ] ■ Pr[w (£ H' v \u £ H v \ 


> Pr[u G H v ] • erf ^ 




from which 


conclude that Pr[tt £ H v \ < er * ^ ‘ as nee< i e d- 


A. 2 Proof of Lemma HOl 

Proof. Fix some z £ IP. The probability mass assigned to z by Ex, s' is the 
probability of drawing a random vector according to the discrete Gaussian 
and hitting some v £ Z m for which X v = z. In other words, this is exactly the 
probability mass assigned by T> 7 /,m s > to the coset A z . Below let T = T(X) C W n 
be the linear subspace containing the lattice A, and T z = T z (X ) C R m be the 
affine subspace containing the coset A z : 


T = T(X) = {v £ I m : X • v = 0}, and T z = T Z {X) = {u G : X ■ v = zj. 
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Let Y be the pseudoinverse of X (i.e. XY r = I n and the rows of Y span the 
same linear sub-space as the rows of X). Let u z = Y T z, and we note that u z 
is the point in the affine space T z closest to the origin: To see this, note that 
u z G T z since X ■ u z = X x Y T z = z. In addition, u z belongs to the row space 
of V, so also to the row space of X, and hence it is orthogonal to T. 

Since u z is the point in the affine space T z closest to the origin, it follows 
that for every point in the coset v G A z we have ||i>|| 2 = H^zll 2 + ||u — iizll 2 , 
and therefore 

p s '(v) = = p s ,(u z )- p s >(v — u z ). 

This, in turn, implies that the total mass assigned to A z by p s > is 

Ps'(A z )= Y Ps'{v) = p 8 '(u z )- Y Ps'{v~u z ) = p s '(u z )- p a/ (A z -u$) 
veA z veA z 

Fix one arbitrary point w z G A z , and let S z be the distance from u z to that 
point, S z = u z — w z . Since A z = w z +A, we get A z — u z = A— S z , and together 
with the equation above we have: 

Ps’(A z ) = p s> (u z )- p s '(A z -u z ) = ps’{u z )- p s> (A-S z ) 

= Ps- («z ) • Ps' ,s z (A) Lem = a ® p s ' (u z )-p s ’(A) ■ (3) 

As a last step, recall that u z = Y r z where YY r = (XX T )~ 1 . Thus p s >(u z ) = 

Ps-{Y T z) = exp(- 7 r|« T yy T z|/s' 2 ) = exp ^-7r|z T ((s'A)(s'A) T ) _1 z|^ = p (s , x) t{z). 


Putting everything together we get 


Ex, s’ (*} = 'E > z™,s r (A z ) = 


MM 


G P(s’XT)(z) ■ 


Ps’(A) 

pA% m ) 




The term p p *'^2y is a normalization factor independent of z, hence the proba- 
bility mass Ex s' {%) is proportional to p( a ’X T ){ z )i upto some “deviation factor” 
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Abstract. This paper investigates the mathematical structure of the 
“Isomorphism of Polynomial with One Secret” problem (IP1S). Our pur- 
pose is to understand why for practical parameter values of IP1S most 
random instances are easily solvable (as first observed by Bouillaguet et 
al.). We show that the structure of the equations is directly linked to a 
matrix derived from the polar form of the polynomials. We prove that 
in the likely case where this matrix is cyclic, the problem can be solved 
in polynomial time - using an algorithm that unlike previous solving 
techniques is not based upon Grobner basis computation. 

1 Introduction 

Multivariate cryptography is a sub area of cryptography the development of 
which was initiated in the late 80 ’s m and was motivated by the search for 
alternatives to asymmetric cryptosystems based on algebraic number theory. 
RSA and more generally most existing asymmetric schemes based on algebraic 
number theory use the difficulty of solving one univariate equation over a large 
group (e.g. x e = y where e and y are known). Multivariate cryptography as for 
it, aims at using the difficulty of solving systems of multivariate equations over 
a small field. 

A limited number of multivariate problems have emerged that can be reason- 
ably conjectured to possess intractable instances of relatively small size. Two 
classes of multivariate problems are underlying most multivariate cryptosystems 
proposed so far, the MQ problem of solving a multivariate system of m quadratic 
equations in n variables over a finite field F (; - that was shown to be NP-complete 
even over F 2 for m ss n [ID]- and the broad family of the so-called isomorphism 
of polynomials (IP) problems. 

Isomorphism of Polynomial problems can be roughly described as the equiv- 
alence of multivariate polynomial systems of equations up to linear (or affine) 
bijective changes of variables. Two separate subfamilies of IP problems can be 
distinguished: isomorphism of polynomials with two secrets (IP2S for short) and 
isomorphism of polynomials with one secret (IP1S for short). A little more in 
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detail, given two m-tuples a = (ai,...,a m ) and b = ( 61 , . . . , b rn ) of polyno- 
mials in n variables over K = F f/ , IP2S consists of finding two linear bijective 
transformations S of K" and T of K m , such that b = ToaoS. Respectively, (com- 
putational) IP1S consists of finding one linear bijective transformations S of IK" , 
such that 6 = 005 . Many variants of both problems can be defined depending 
on the value of the triplet (n, to, q), the degree d of the polynomial equations of a 
and 6 , whether these polynomials are homogeneous or not, whether S and T are 
affine or linear, etc. It turns out that there are considerable security and simplic- 
ity advantages in restricting oneself, for cryptographic applications, to instances 
involving only homogeneous polynomials of degree d and linear transformations 
S and T. For performance reasons, the quadratic case d = 2 is most frequently 
encountered in cryptography. Due to the existence of an efficient canonical re- 
duction algorithm for quadratic forms, instances such that to > 2 must then be 
considered. The cubic case d = 3 is also sometimes considered, then instances 
such that to = 1 are generally encountered. 

Many asymmetric cryptosystems whose security is related to the hardness of 
special trapdoor instances of IP2S were proposed in which all or part of the 
rn-tuple of polynomials 6 plays the role of the public key and is related by secret 
linear bijections S and T to a specially crafted, easy to invert multivariate poly- 
nomial mapping a. Most of these systems, e.g. Matsumoto and Imai’s seminal 
multivariate scheme C* but also reinforced variants such as SFLASH and 
HFE [18116] were shown to be weak because the use of trapdoor instances of 
IP2S with specific algebraic properties considerably weakens the general IP2S 
problem. A survey of the status of the IP2S problems and improved techniques 
for solving homogeneous instances are presented in [I] and [4] . 

The IP1S problem was introduced in [IB] by Patarin, who proposed in the 
same paper a zero-knowledge asymmetric authentication scheme named the IP 
identification scheme with one secret (IP1S scheme for short). This authenti- 
cation scheme is inspired by the well known zero-knowledge proof for Graph 
Isomorphism by Goldreich et al. m- It can be converted into a (less practical) 
asymmetric signature scheme using the Fiat-Shamir transformation. The IP1S 
problem and the related identification scheme were believed to possess several 
attractive features: 

— The conjecture that the IP1S problem is not solvable in polynomial time was 
supported by the proof in [IT] that the quadratic version of IP1S (QIP1S for 
short) is at least as hard as the Graph Isomorphism problem (GI) 0 , one of 
the most extensively studied problems in complexity theory. While the GI 
problem is not believed to be NP-complete since it is NP and co-NP and 
hard instances of GI are difficult to construct for small parameter values, GI 
is generally believed not to be solvable in polynomial time. 

— unlike the encryption or signature schemes based on IP2S mentioned above, 
the IP1S scheme does not use special trapdoor instances of the IP1S problem 

1 However as mentioned in the conclusion of this paper, if the flaw recently discovered 
by the authors in the corresponding proof in [IT] is confirmed, this casts some doubts 
on the fact that Quadratic IP1S is indeed as hard as GI. 
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and therefore its security is directly related to the intractability of general 
IP IS instances. 

The IP IS problem also has some loose connections with the multivariate signa- 
ture scheme UOV m, that has until now remarkably well survived all advances 
in the cryptanalysis of multivariate schemes. While in UOV the public quadratic 
function b is related to the secret quadratic function by the equation b = a o S, 
both a and S are unknown whereas only S is unknown in the IP IS problem. 

Former Results. Initial assessments of the security of practical instances of 
the IP IS problem suggested that relatively small public key and secret sizes - 
typically about 256 bits - could suffice to ensure a security level of more than 2 64 . 
The IP IS scheme therefore appeared to favorably compare with many other zero- 
knowledge authentication schemes, e.g [21122120] . Moreover, despite advances in 
solving some particular instances of the IP1S problem, in particular Perret’s 
Jacobian algorithm!! [TTJj . the four challenge parameter values proposed in 1996 
[16] (with q = 2 or 2 16 , d = 2 and to = 2, or d = 3 and m = 1) remained 
unbroken until 2011. 

Significant advances on solving IP1S instances that are practically relevant for 
cryptography were made quite recently [HE]. Dubois in [7] and the authors of [2] 
were the first to notice that the IP IS problem induces numerous linear equations 
in the coefficients of the matrix of S and of the inverse mapping T = S~ x . 
When to > 3, the number mn 2 of obtained linear equations is substantially 
larger than the number 2n 2 of variables. While the system cannot have full 
rank since the dimension of the vector space of solutions is at least 1, it can 
heuristically be expected to have a very small vector space of solutions that can 
be tried exhaustively. The authors of [2] even state that they “empirically find 
one solution (when the polynomials are randomly chosen)” . 

Therefore the most interesting remaining case appears to be to = 2. It is 
shown in [2] that the vector space of solutions of the linear equations is then 
isomorphic to the commutant of a non-singular n X n matrix M and that its 
dimension r is lower bounded by n in odd characteristic and 2 n in even charac- 
teristic. The reported computer experiments indicate that r is extremely likely 
to be close to these lower bounds in practice. While for typical values of q n the 
vector space of solutions is too large to be exhaustively searched, one can try 
to solve the equation b = a o S over this vector space. This provides a system 
of quadratic equations in a restricted variable set of r « n (resp. r « 2 n) coor- 
dinates. The approach followed in [2] in order to solve this system consisted of 
applying Grobner basis algorithms such as Faugere’s FA [8] and related computer 
algebra tools such as FGLM [9] . This method turned out to be quite successful: 
all the IP1S challenges proposed by Patarin were eventually broken in comput- 
ing times ranging from less than 1 s to 1 month. This led the authors of [2] to 
conclude that “[the] IPIS-Based identification scheme is no longer competitive 

2 This algorithm recovers mn linear equations in the coefficients of S and is therefore 

suited for solving IP1S instances such that rri ~ n. 
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with respect to other combinatorial-based identification schemes” . However, the 
heuristic explanation suggested in [2], namely that the obtained system was so 
massively over defined that a random system with the same number of random 
quadratic equations would be efficiently solvable in time 0(n 9 ) with overwhelm- 
ing probability, was later on shown to be false by one of the authors of [2] , due 
to an overestimate of the number of linearly independent quadratic equations. 

This is addressed in Bouillaguet’s PhD dissertation [T] where the results of 
[2] are revisited. The main discrepancy with the findings of [2] is the observa- 
tion that in all the reported experiments in odd and even characteristic, the 
number of linearly independent quadratic equations, that was supposed in [2] 
to be close to n 2 , is actually bounded over by a small multiple of n and only 
marginally larger than r. The author writes “This means that we cannot argue 
that solving these equations is doable in polynomial time. An explanation of this 
phenomenon has eluded us so far.” Despite of the surprisingly small number of 
linearly independent quadratic equations, nearly all instances are confirmed to 
be efficiently solvable for all practical values of n when the size q of the field 
is sufficiently small (q = 2 or 3) and still solvable efficiently up to values of n of 
about 20. The author writes “For instance, when q = 2 and n = 128 we are 
solving a system of 256 quadratic equations in 256 variables over F 2 . When the 
equations are random this is completely infeasible. In our case, it just takes 3 
minutes ! We have no clear explanation of this phenomenon.” 


Our Contribution. The lack of explanation for the success of the attack - more 
precisely the puzzling fact that the number of linearly independent quadratic 
equations is close to n in odd characteristic and to 2 n in even characteristic and 
the even more puzzling fact that nearly all instances are nevertheless solvable - 
motivated our research on IP1S. We revisited the former analysis and eventually 
found an algebraic explanation of why most random instances of the quadratic 
IP IS problem are efficiently solvable that leads to a new method (not based 
on Grobner basis computations) to directly solve these instances. Our analysis 
shows in particular that in the likely cases where the characteristic is odd and 
the matrix M is cyclic or the characteristic is even and M is similar to a block- 
wise diagonal matrix with two equal cyclic f x | diagonal blocks, the quadratic 
equations split up in an appropriate base in small triangular quadratic systems 
that can be solved efficiently in polynomial time. The highlighted structure of 
the quadratic equations seems to be the essential reason why Grobner basis 
computations behave so well on most instances. 

The rest of this paper is organized as follows. In Section [21 we present the 
problem IP IS, its background and some major mathematical results used in 
the following sections. We then discuss in Section [3] and [4] the resolution of the 
problem over finite fieds of odd, resp. even characteristic. 
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2 The Isomorphism of Polynomial Problem with One 
Secret 

2.1 Notations and First Definitions 

Let IK be a field; for practical considerations, we shall assume that IK is the finite 
field F g with q elements, although most of the discussion is true in the general 

A (homogeneous) quadratic form in n variables over IK is a homogeneous 
polynomial of degree two, of the form q = J2i.j=i... n °-i>i x i x h where the coef- 
ficients a.ij belong to K. For simplicity, we write x = (xf) for the vector with 
coordinates x^. The quadratic form q can be described by the matrix with general 
term aiij. Note that the matrix representation of a quadratic form is not unique: 
two matrices represent the same linear form if, and only if, their difference is 
skew-symmetric . 

The polar form associated to a quadratic form q is the bilinear form b = V(q) 
defined by b(x,y) = q(x + y) — q(x) — q(y). This is a symmetric bilinear form. 
This can be used to give an intrinsic definition of bilinear forms (which is useful 
to abstract changes of bases from some proofs below): given a vector space V, a 
quadratic form over V is a function q : V -> IK such that 

(i) for all x 6 V and A e IK, q(Xx) = A 2 q(x)\ 

(ii) the polar form V(q) is bilinear. 

For any matrix A, let t A be the transpose matrix of A and V(A) be the 
symmetric matrix *A + A. Then if q is a quadratic form with matrix A, its polar 
form has matrix V(A). The quadratic form q is regular if its polar form is not 
singular, i.e. if it defines a bijection from V to its dual. In general, we define the 
kernel of a quadratic form to be the kernel of its polar form. 

From the definition of b = V(q) we derive the polarity identity 

2 q(x) = b(x,x). (1) 

This identity obviously behaves very differently when 2 is a unit in IK and 
when 2 = 0 in K. This forces us to use some quite different methods in both 
cases. 

If 2 is invertible in IK then the polarity identity dTJ) allows recovery of a 
quadratic form from its polar bilinear form. In other words, quadratic forms 
in n variables correspond to symmetric matrices. 

Conversely, if 2 = 0, then the polarity identity reads as b{ x, x) = 0; in other 
words, the polar form is an alternating bilinear form. In this case, equality of 
polar forms does not imply equality of quadratic forms. Define A(A) as the 
matrix of diagonal entries of the matrix A. Then quadratic forms A and B are 
equal if, and only if, P(A) = V(B) and A(A) = A(B). 

2.2 The Quadratic IP1S Problem 

We now state the quadratic IP1S problem and give an account of its current 
status after the recent work of [5] and pQ. 
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Problem 1 (Quadratic IP1S). Given two m-tuples a = (ai,...,a m ) and b = 
(bi, . . . . b rn ) of quadratic homogeneous forms in n variables over K = F g , find 
a non-singular linear mapping S £ GL n ( K) (if any) such that b = a o S, i.e. 
bi = a t o S for i = 1, . . . , m. 

Remark 1. In order not to unnecessarily complicate the presentation, our def- 
inition of the IP1S problem slightly differ^] from the initial statement of the 
problem introduced in [IS]. Though the name “quadratic homogeneous IP IS” 
might be more accurate to refer to the exact class of instances we consider, we 
will name it quadratic IP IS or IP IS in the sequel. 

If we denote by A t , resp. Bj any n x n matrices representing the a,;, resp. the bi 
and denote by X the matrix representation of S, the conditions for the equality 
of two quadratic forms given in Section 2.1. allow to immediately translate the 
quadratic IP IS problem into equivalent matrix equations. 

- If the characteristic of K is odd: the problem is equivalent to finding an 
invertible matrix X that satisfies the m polar equations: V{B() = t XV{A i )X 

— If the characteristic of K is even: the problem is equivalent to finding an 
invertible matrix X that satisfies the polar and the diagonal equations: 
V(Bi) = t XV(A i )X; A(Bi) = AfXAiX). 

In the following sections we will consider IP IS instances such that m = 2, that 
are believed to represent the most “interesting” instances of IP1S as reminded 
above. Matrix pencils, that can be viewed as n x n matrices whose coefficients 
are polynomials of degree 1 of IK [A] represent a convenient way to capture the 
above equations in a more compact way. If we denote by A and B the matrix 
pencils XA 0 + Ai and XB 0 + Bi . and by extension V(A) and V(B) the symmetric 
matrix pencils XP (Aq) + V(Ai) and XV (B 0 ) + V(BQ, the two polar equations 
can be written in one equation: V(B) = t XV(A)X. However, as detailed in the 
next section, the theory of pencils is far more powerful than just a convenient 
notation for pairs of matrices. See for instance [3]. 


2.3 Mathematical Background 

In this Section we briefly outline a few known definitions and results related to 
the classification of matrices and matrix pencils and known methods for solving 
matrix equations that are relevant for the investigation the IP1S problem. 

3 While in [16] the isomorphism of two m-tuples quadratic polynomials comprising also 
linear and constant terms through a non-singular affine transformation was consid- 
ered, we consider here the isomorphism of two m-tuples of quadratic forms through 
a non-singular linear transformation. This replacement of the original definition by 
a simplified definition is justified by the fact that all instances of the initial prob- 
lem can be shown to be either easily solvable due to the lower degree homogeneous 
equations they induce or efficiently reducible to an homogeneous quadratic instance. 
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Basic Facts about Matrices. Two matrices A and B are similar if there 
exists an invertible matrix P such that P _1 AP = B and congruent if there 
exists an invertible P such that t PAP = B. 

The matrix A is called cyclic if its minimal and characteristic polynomials are 
equal. 

For any matrix A, the commutant of A is the algebra Ca of all matrices 
commuting with A. It contains the algebra K[A], and this inclusion is an equality 
if, and only if, A is cyclic. 

For any matrix A, let j/f be the prime factorization of its minimal polyno- 
mial. Then K[A] is the direct product of the algebras K[a;]/pj(x) ei ; each of these 
factors is a local algebra with residual field equal to the extension field K[x]/jj,;. 


Pencils of Bilinear and Quadratic Forms. Let V be a K- vector space and 
Q(V) be the vector space of all quadratic forms on V . A projective pencil of 
quadratic forms on V is a projective line in PQ(V), i.e. a two-dimensional sub- 
space of Q(V). As a projective pencil is the image of the projective line P 1 
in Q(V), it is determined by the images of the points oo and 0 in P 1 , which we 
write A 0 and A^ . 

An affine pencil of quadratic forms is an affine line in Q(V), or equivalently 
a pair of elements of Q(V). The affine pencil with basis (A^ , Aq) may also be 
written as a polynomial matrix A\ = Aq + XA^. Given a projective pencil A 
of Q(V), the choice of any basis (A^, , Aq) of A determines an affine pencil. 

A projective pencil is regular if it contains at least one regular quadratic 
form. An affine pencil {A^, Ao) is regular if A <*, is regular; it is degenerate if the 
intersection of the kernels of the quadratic forms A\ is nontrivial. 

If an affine pencil is non-degenerate, then the polynomial det A\ is non-zero; 
choosing any A which is not a root of this polynomial proves that the associated 
projective pencil is regular (over K itself if it is infinite, and over a finite extension 
of K if it is finite). This gives a basis of the projective pencil which turns the 
affine pencil into a regular one. We shall therefore assume all affine pencils to be 
regular. 

Two pencils A, B of quadratic forms are congruent if there exists an invertible 
matrix X such that *XA\X = B\. The case to = 2 of the quadratic IP1S prob- 
lem reduces to the Pencil congruence problem: given two affine pencils A and B , 
known to be congruent, exhibit a suitable congruence matrix X. 

We first note that the IP1S problem easily reduces to the case where both 
pencils are regular. Namely, if one (and therefore both) is degenerate, then we 
may quotient out both spaces by the (isomorphic) kernels of the pencils; this 
defines non-degenerate affine pencils on the quotient vector spaces, which are 
still congruent. Since the associated projective pencils are regular, a change of 
basis in the pencils (and maybe an extension of scalars) brings us to the case of 
two regular affine pencils. 

We define pencils of bilinear forms in the same way as pencils of quadratic 
forms. The pencil b\ = bo + A b^ regular if b is; in this case, the characteristic 
endomorphism of the pencil is the endomorphism / = bjf o bo- 
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The following lemma allows to decompose pencils as direct sums, with each 
factor having a power of an irreducible polynomial as its characteristic endomor- 
phism. 

Lemma 1. Let b be a regular pencil of symmetric bilinear forms. Then all pri- 
mary subspaces of the characteristic endomorphism f are orthogonal with respect 
to all forms of b. 

Proof. We have to prove the following: given any two mutually prime factors p, q 
of / and any x,y £ V such that p(f)(x) = 0 and q(f)(y) = 0, then for all A, we 
have b\{x,y) = 0. For this it is enough to show that b^i'x. y) = 0. 

Since p, q are mutually prime, there exist u,v such that up + vq = 1. Note 
that, for all x,y e V, we have boo{x,fy) = bo(x,y) = bo(y,x) = boo(fx,y)-, 
therefore, all elements of K[/] are self-adjoint with respect to b^. From this we 
derive the following: 

boo(x,y) = boo{x, u(f)p(f)y + v{f)q{f)y) 

= Mu(/)pC/>,y) + h oo (x,v(f)q(f)y) (2) 

= 0 . 

□ 


Explicit Similarity of a Matrix and Its Transposed. The next result is 
intensively used in the sequel to deal with symmetric pencils. Although this 
result is classic m , we are interested with the explicit form given below. 

Theorem 1. For any matrix M, there exists a non-singular symmetric matrix 
T such that t MT = TM. 


Proof. Using primary decomposition for M, 
/M 0 1 


3 may assume that it is of the 


(3) 


where Mo is the companion matrix of a polynomial p( A) = A" + Pi ■ We 
then define matrices Tq and T by 


T 0 = 


( pi 

Pn—l 

V i 


1 \ 

0 / 



(4) 


One can easily verify that To is invertible, symmetric and t MoTo = TqMq. and 
that the same is true for T and M. □ 
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3 IP1S in Characteristic Different from Two 

Let IK be a field of characteristic different from twc0. In this case, the polarity 
identity fll]) identifies quadratic forms with symmetric bilinear forms, or again 
with symmetric matrices with entries in IK. We shall therefore write a quadratic 
pencil A as A\ = Aq + AA^ , where Aq and A^ are symmetric matrices. 

Proposition 1. Let A\ = Ao + XA <*,, B\ = Bo + X B^ be two regular affine 
pencils. 

(i) If A\ is congruent to B\, then the characteristic matrices 

Ma = A^ 1 A 0 and M B = Bff) B 0 

are similar. 

(ii) Assume that Ma and M B are similar and choose P such that P~ x MaP = 
M B . Then t PA\P = t PA ao P(X + M B ). 

(Hi) Assume that A\ = A^A + M) and B\ = B oa (X + M). Then the solutions 
of the pencil congruence problem are exactly the invertible X such that 

XM = MX and t XA oa X = Boo. (5) 

Proof, (i). Since A\ is regular, A^ is invertible and we may write A\ = A 00 (A + 
A^Ao): likewise, B\ = B^fX + B^f B 0 ). Choose P such that t PA\P = B\. 
then 

B^X + Mb) = t PA\P = t PA 00 P(X + P~ 1 M a P), (6) 

which implies P~ 1 MaP = M B as required. The same computations prove (ii). 

The equations (J5]) follows directly from the equality t XA 00 (A + M)X = 
t XA 00 X{X + X~ 1 MX). □ 

We now restrict ourselves to the case where the characteristic endomorphism 
is cyclic. 

Proposition 2. Let A\ = A^A + M) and B\ = B oa (X + M) be two regular 
symmetric pencils such that the matrix M is cyclic, that is, its minimal and 
characteristic polynomials are equal. 

Then the solutions X of the pencil congruence problem are the square roots 
of A" 1 Boo in the algebra K[M]. 

Proof. Since M is cyclic, its commutant is reduced to the algebra K[M]; there- 
fore, all solutions of the congruence problem are polynomials in M. 

Since A\ is symmetric, both matrices A 0 0 and Ao = A^M are symmetric; 
therefore, t MA 00 = A^M. Since X is a polynomial in M, we deduce that 
also t XA oa = A^X. 

The relation t XA oa X = B^ may therefore be rewritten as A^X 2 = B^. 
or X 2 = A^ 0 1 B 00 . □ 

4 Although this is not used in cryptography, we mention that this section also applies 
verbatim to the case of characteristic zero. 
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Theorem 2. Let K be a finite field of odd characteristic and A\, B\ he two 
regular pencils of quadrics over K”, congruent to each other, such that at least 
one is cyclic (and therefore both are). Then the pencil congruence problem may 
be solved using no more than 0(n 3 ) operations in the field K. 

Proof. The first step is to reduce to the case of primary components of the 
characteristic endomorphism. This may be done, using for example Frobenius 
reduction of both matrices Aq and Bq , with a complexity of 0(n 3 ) 

operations. This also provides the change of basis making the characteristic 
endomorphism of both pencils to have the same matrix. 

There remains to compute a square root of C = A^f B^ in K[M], where 
now the minimal polynomial of M is p e , with p irreducible. For this we first 
write C as a polynomial g(M ); this again requires 0(n 3 ) operations. To solve 
the equation y 2 = g(M ) in the ring K [M] = K [x]/p(x) e , we first solve it in the 
(finite) residual field K[x\/p{x ), with complexity 0(n 3 ) again; lifting the solution 
to the ring K[M] requires only 0(n 2 ) with Hensel lifting. □ 

Solutions of the IP IS problem are square roots of an element C of the alge- 
bra K[M]; therefore, the number of solutions is 2 s , where s is the number of 
connected components of K[M], that is, the number of prime divisors of the 
minimal polynomial of M. 


Summary and Computer Experiments. The case where all the elementary 
divisors of V{ A) are pairwise co-prime - or equivalently where M is cyclic - rep- 
resents in practice a quite large fraction of random cases (see for instance EH]). 
In this case, as shown above, the number of solutions is exactly 2 s where s is 
the numbers of elementary divisors and solutions can be efficiently computed 
(in polynomial time 0[n 3 )) by our method. The highlighted structure of the 
equations also provides some likely explanations of why Grobner basis computa- 
tion methods such as those presented in [2] were successful in this case. We give 
in next table results (timings) of our Magma script SolveCyclicOddPC, t is the 
mean execution time when solving 100 random cyclic IP1S instances, t is the 
observed fraction in percent of such “cyclic” instances over random instances. 



4 IP1S in Characteristic Two 

Let IK be a perfect field of characteristic two. In this case, the polarity iden- 
tity (HJ) shows that the polar form b = V(q) attached to a quadratic form q is an 
alternating bilinear form. 
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4.1 Pencils of Alternating Bilinear Forms 

This paragraph is a reminder of classical results. We refer the reader to [H] for 
the proofs. 

If b is alternating and nondegenerate, then the vector space V has a symplectic 
basis , i.e. a basis (ei, . . . , e„, /i, . . . , /„) such that b(ei,fi) = 1 and all other 
pairings are zero. In particular, the dimension of V is even. The vector E space 
generated by the e* is equal to its orthogonal space E 1 - ; such a space is called a 
Lagrangian space for b. 

We recall that two matrices A and B define the same quadratic form if and 
only if V(A) = P(B) and A(A) = A(B). 

Although quadratic forms only produce alternating bilinear forms in charac- 
teristic two, the following lemma about alternating forms is true in all charac- 
teristics. It proves that there exists a basis of V in which the pencil has the 
block-matrix decomposition 

A oo = (il), Ao = (° 0 F ) ; A^Ao = (o *° F ) . (7) 

The matrix F is called the Pfaffian endomorphism of A. 

Lemma 2. Let b = (boo, bo) be a regular pencil of alternating bilinear forms 
on V. Then there exists a symplectic basis for b^ whose Lagrangian is stable by 
the characteristic endomorphism ofb. 

Proof. Let / be the characteristic endomorphism of b. By Lemma [TJ we may 
replace V by one of the primary components of / and therefore assume that 
the minimal polynomial of / is p n where p is a prime polynomial. By extending 
scalars to K[A]/p(A) and replacing bo by Xboo + bo we may assume that p(t) = t. 
We now prove the lemma by induction on dim V. 

Since t n is the minimal polynomial of / and is non-degenerate, there 
exists x,y <E V such that boo(x, f n ~ l y) = 1. Let W = K [/] x © K [f]y. Then we 
may write V = W ® W 1 - where both W and its &oo-orthogonal W 1 - are stable 
by /; since W 1 - satisfies the lemma by the induction hypothesis, we only need 
to prove it for W. 

Let aft) = 1 + ait -\ b be a polynomial and x' = a(f)x. Then we 

still have b a 0 (x’, f n ~ 1 y) = l, and moreover we can choose a so that b^ (:rJ , f l y) = 
0 for all* = 0, . . . , n — 2. In other words, (x' , fx' , . . . , f n ~ 1 x', f n ~ 1 y, f n ~ 2 y, ■ ■ ■ , 
fy, y) is a symplectic basis for b^ on W. By construction, its Lagrangian is K[f]x, 
which is obviously stable by the characteristic endomorphism /. □ 

Proposition 3. Let K be a binary field. Any regular pencil of alternating bilin- 
ear forms is congruent to a pencil of the form 

Aoo = (° o) , ^0 = (™T), (8) 

where M is in rational (Frobenius) normal form and T is the symmetric matrix 
defined in Theorem [H 
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Proof. From the equation [71 choose a matrix P such that M = P~ X FP is 
in rational normal form and define T as in Theorem [TJ Then the coordinate 
change ^ t p -i produces the required form. □ 

Let A be a pencil as in ([8l). The automorphism group O(A) of A is the set of 
matrices X = xlj such that t XAX = A, that is all X, commute with M 
and t XiTX i + *X 3 TX 2 = T. 

From now, we suppose that M is cyclic and for the sake of simplicity that its 
primary decomposition has only one component. 

Since M is cyclic, all X t belong to K[M]. The group 0(A) is generated by the 
elementary transformations 

Gfi(x) = (£?), G 2 (X) = (i °) , G 3 (X) = ( X 0X °- 1 ), G 4 = (SS), (9) 

where X G K[M], X invertible for G 3 (X). The first three transformations gen- 
erate the subgroup of positive automorphisms of A. This is a subgroup of order 
two of the orthogonal group [6] . 

4.2 Pencils of Quadratic Forms 

The following proposition deals with the diagonal terms of a quadratic form in 
the cyclic case. We recall that, using the notations of Theorem [U IK [Mo] is an 
extension field of K, and K[M\ is the (local) K[M 0 ]-algebra generated by 



We write ip(X ) = X 2 for the Frobenius map of K[Mo]. Since this is a finite field, 
the Frobenius map is bijective. It extends to K[M] as x iH l ) = )P x(H % . 

Proposition 4. Define matrices M of size n, Mo, To of size e = n/d as in 
Theorem 0 

(i) The M.-linear map K[Md] i— > K e , X t — > A(ToX) is an isomorphism. 

(ii) For any diagonal matrix D of size e, there exists a (unique) matrix C = 
ipo(D) G K[Mo] such that, for all X G K[Mo]: 

AifXDX) = A(T 0 CX 2 ). (11) 

(Hi) Let D be a diagonal matrix of size n, written as blocks D 0 , . . . , D^-i, and 
write X G K[M] as X = with Xi G K[Mo]. Also define if(D) = 

G K[M]. Then we have the relation in K[M] 

fiACXDX)) = g>{X) ■ if(D). 


(12) 
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Proof, (i) Since 2 = 0 in K, for any symmetric matrix A and any X, we have 

A( t XA(A)X) = A{*XAX). (13) 

Since the space K[Mo] has dimension e over K. we only have to check injectivity. 
Assume A(T 0 X) = 0 with I/O; since K[M 0 ] is a field, X is invertible. Let Y = 
y> _ 1 (.X’ _1 ). We then have 

A (T 0 ) = A(T 0 XY 2 ) = A( t Y{T 0 X)Y) = ACYA(ToX)Y ) = 0. (14) 

Let p(x ) = poH \-p e -ix e ~ 1 +x e be the minimal polynomial of Mo- From A(To) 

= 0 we deduce that p e -i = Pe- 3 = • • • = 0, which contradicts the irreducibility 
of p. 

(ii) Let C G K[M 0 ] such that A{C) = D; applying (fl3l) to the symmetric 
matrix TqC and using the symmetry of TqMq yields 

A(T 0 CX 2 ) = A^XToCX) = A(*XA(T 0 C')X) = A^XDX). (15) 

(iii) From direct computation we find that the diagonal blocks of t XDX are 

B m = hence A{B m ) = ^ A{T 0 MDj)Xf) and MB m ) = 

ZMD^piXi). □ 

For any binary field K, we write p(K) for the set of elements x 2 +x G K. This 
is an additive subgroup of K, and the characteristic-two analogue of the set of 
squares. For any element a of K[M], we call valuation of regularity of a that we 
simply note val(a) the smallest integer m such that there exists an invertible a' 
of K[M] such that a = H m a' . 

Proposition 5. Any regular pencil of quadratic forms is congruent to a pencil 
of the form 



where M, T are as in Prop. 0 and Di are diagonal matrices whose values ai = 
satisfy either one or the other of the following two kinds of canonical 

forms: 

(i) ai = H m , val(a i + 03 ) > m, 02 = 0 or = SH d ~ 1 ~ m , val(ai) > m, for 
some m G {0, . . . , d}, and some fixed 6 G K[M 0 ] \ p(K[M 0 ]); 

(ii) a\ = H m or 03 = H m , val(ati + 03 ) = m, a 2 = 04 , val(a 2 ) > m for some 
m G {0, . . . , d}, and some fixed S G K[M 0 ] \ p(K[Mo]). 

Proof. By Prop. QT] we may compute bases in which the pencils of polar forms 
have the form (O . In the same bases the pencils have the form (|Tfil) with M, T, 
Mo, To as in Theorem Q] and D,, are some diagonal matrices. We now perform 
elementary transformations of the orthogonal group of V{A) to simplify the 
diagonal part of the quadratic pencil. We use the transformations Gi(X) from (|TT[) 
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for a matrix X = xq H + Xd~iH d 1 G K[M], The effects of the elementary 

transformations Gi(X) on the coefficients a* are: 

Gi(X): a 1 ^a 1 +cp(X)a 2 +r/>{A(TX)), 
a 3 G- o 3 + ip(X) o 4 + ifi(A(TX)), 
a 2 <— a 2 , 0:4 <— o 4 ; 

G 2 {X) : a 2 <— a 2 + ip{X) o x + i>(A(TX)), 
a 4 •<— a 4 + ip{X) a 3 + A(TX )), 

ai <— ai, a 3 <— 0:3; 

G 3 (X) : a i 4 -ip(X)oti, a 2 G- ^(X -1 ) a 2 , 
a 3 G- ip(X) a 3 , o 4 G- p(W _1 )a 4 ; 

Ga : «!•<-)■ a2, «-»• o 4 . 

A direct computation gives 


i>(A(TMX))= £ x 2 i _ ( d-i)H\ 


As in Prop. 0J we write Di as d blocks Dij and define a^j = 'tpoD.^j. Prom 
what we get above we explicit the effects of the elementary transformation Gi (X) 
on the coefficients otij : 

Gt(X): ai , m ^a 1 , m + £ a 2 >i x] form < 

ai,m <- a l,m + a 2 ,i x j + x 2m-(d-l) for TO > ; 

i+j=m 


If all Oj = 0, we are done: the pencil is canonical. If not, we search the value 
a* with smallest valuation. Using G 4 , we may assume it is (\\ or a 3 . We first 
suppose that we have val(oi +0:3) > m, that is a\ and 0:3 have the same trailing 
term. We call this the case (i). Using G 3 , we may assume on = H rn . and therefore 
03 = H m +a, with val(o) > m. We look then for X such that G 2 (X)(a 2 ) = 0. We 
note that the corresponding system is triangular and all equations can be solved 
except maybe for this one: a 2 > d-i- m = x d-i-2m + x d-i-2m- Therefore we may 
assume that a 2 = 0 or a 2 = 6 H d ~ 1 ~ m for some fixed S G K[M 0 ] \ p(K[Af 0 ]). We 
note also that G 2 (X) does not decrease the valuation of o 4 . We have therefore 
by hypothesis val(o 4 ) > to. 

We now examine the case (ii) where val(oi + a 3 ) = to. Using again G 3, we 
may assume that o 4 = H rn or a 3 = H rn . Let’s note 0:1 + 03 = H rn a where 
o is invertible. We are looking for X such that G 2 (X)(a 2 ) = G 2 {X)(ati). By 
hypothesis on the valuation, we can write 02 + o 4 = H m a' for some o'. We 
naturally choose X = p _1 (o'o _1 ). At this stage, we can consider that 02 = o 4 . 
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However, the condition on the valuation may not hold. If by chance val(a2) > m, 
then we are done. If on the contrary val(«2) < to, then by using G4, we search 
instead for a canonical form of the kind (i). □ 

Theorem 3. Let K be a finite field with characteristic two. The cyclic case of 
the IP 1 S problem is solvable using 0 (n 3 ) operations in the field K. Moreover, 
in the generic case, the IP IS problem has exactly 2 s solutions, where s is the 
number of components within the primary decomposition of M. 

Proof. To solve the IP IS problem for two pencils A and B. we may reduce them 
to the same canonical form using Prop. [5] using first the primary decomposition. 
Following along the proof of the proposition, we see that it is constructive and 
that all linear algebra algorithms used require at most 0(n 3 ) field operations. 

Solutions of the IP1S problem correspond bijectively to automorphisms of the 
canonical pencil. In the generic case, the ideal generated by the values (a\,af) 
is the full algebra K[M]; the canonical pencil is then such that that 04 = 1 
and a.i G {0,SH d ~ 1 }. 

For both values of a.i, since the equation x\_ x + x d _ x = 0 has only the 
solutions 0 and 1 in each component IK [Mo], the IP IS problem has in this case 
exactly 2 s solutions. □ 

IP1S Problem for a and b : Summary and Computer Experiments. 

Next table gives timings of our Magma script SolveCyclicEvenIPIS, with the 
same convention as for the odd case : r represents the observed fraction of cyclic 
cases and t the average computing time over these cases. 


II *1 


Q 

n 

t 

T 

2 8 

20 

0.2s. 

100. 

2 8 

32 

0.6s. 

100. 

2 8 

80 

20. s. 

100. 

2 8 

128 

133. s 

100. 


5 Conclusion and Future Work 

We have shown that special instances of the quadratic homogeneous IP1S prob- 
lem with to = 2 equations can be solved in polynomial time. These instances are 
those where the characteristic endomorphism of the pencil (or its PfafHan when 
the characteristic of the field is 2) is cyclic, and represent in practice a large 
fraction of generic instances. In a subsequent work, we studied the case where 
the characteristic endomorphism is no longer cyclic and found similar results to 
be published - at least for odd characteristic fields. In a work still in progress, 
we try to extend these results to QIP1S problem with more than 2 equations, 
and therefore expect to confirm that QIP1S is not as hard as GI. 
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Abstract. The security of public-key encryption (PKE), a widely- used 
cryptographic primitive, has received much attention in the cryptology 
literature. Many security notions for PKE have been proposed, includ- 
ing several versions of CPA-security, CCA-security, and non-malleability. 
These security notions are usually defined via a game that no efficient 
adversary can win with non-negligible probability or advantage. 

If a PKE scheme is used in a larger protocol, then the security of this 
protocol is proved by showing a reduction of breaking a certain security 
property of the PKE scheme to breaking the security of the protocol. A 
major problem is that each protocol requires in principle its own tailor- 
made security reduction. Moreover, which security notion of the PKE 
scheme should be used in a given context is a priori not evident; the 
employed games model the use of the scheme abstractly through oracle 
access to its algorithms, and the sufficiency for specific applications is 
neither explicitly stated nor proven. 

In this paper we propose a new approach to investigating the applica- 
tion of PKE, based on the constructive cryptography framework | 24I25 |. 
The basic use of PKE is to enable confidential communication from a 
sender A to a receiver B, assuming A is in possession of B’s public key. 
One can distinguish two relevant cases: The (non-confidential) communi- 
cation channel from A to B can be authenticated (e.g., because messages 
are signed) or non-authenticated. The application of PKE is shown to 
provide the construction of a secure channel from A to B from two (as- 
sumed) authenticated channels, one in each direction, or, alternatively, 
if the channel from A to B is completely insecure, the construction of a 
confidential channel without authenticity. Composition then means that 
the assumed channels can either be physically realized or can themselves 
be constructed cryptographically, and also that the resulting channels 
can directly be used in any applications that require such a channel. The 
composition theorem of constructive cryptography guarantees the sound- 
ness of this approach, which eliminates the need for separate reduction 
proofs. 

We also revisit several popular game-based security notions (and vari- 
ants thereof) and give them a constructive semantics by demonstrating 
which type of construction is achieved by a PKE scheme satisfying which 
notion. In particular, the necessary and sufficient security notions for the 
above two constructions to work are CPA-security and a variant of CCA- 
security, respectively. 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 134-JT53] 2013. 
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1 Introduction 

Public-key encryption (PKE) is a cryptographic primitive devised to achieve con- 
fidential communication in a context where only authenticated (but not confiden- 
tial) communication channels are available |1 1I34J . The cryptographic security of 
PKE is traditionally defined in terms of a certain distinguishing game in which 
no efficient adversary is supposed to achieve a non-negligible advantage. There 
exists quite a wide spectrum of security notions and variants thereof. These no- 
tions are motivated by clearly captured attacks (e.g., a chosen-ciphertext attack) 
that should be prevented, but in some cases they seem to have been proposed 
mainly because they are stronger than previous notions or can be shown to be 
incomparable. 

This raises the question of which security notion for PKE is suitable or neces- 
sary for a certain higher-level protocol (using PKE) to be secure. The traditional 
answer to this question is that for each protocol one (actually, a cryptography 
expert) needs to identify the right security notion and provide a reduction proof 
to show that a PKE satisfying this notion yields a secure protocol^ 

An alternative approach is to capture the semantics of a security notion by 
characterizing directly what it achieves, making explicit in which applications 
it can be used securely. The constructive cryptography paradigm [24125] was 
proposed with this general goal in mind. Resources such as different types of 
communication channels are modeled explicitly, and the goal of a cryptographic 
protocol or scheme n is to construct a stronger or more useful resource S from 
an assumed resource R, denoted as R I > S. Two such construction steps can 
then be composed, i.e., if we additionally consider a protocol ip that assumes the 
resource S and constructs a resource T, the composition theorem states that 

R S A S 1=^ T =>■ R l=^> T, 

where ip on denotes the composed protocol. 

Following the constructive paradigm, a protocol is built in a modular fashion 
from isolated construction steps. A security proof guarantees the soundness of 
one such step, and each proof is independent of the remaining steps. The compo- 
sition theorem then guarantees that several such steps can be composed. While 
the general approach to protocol design based on reduction proofs is in principle 
sound, it is substantially more complex, more error-prone, and not suitable for 
re-use. This is part of the reason why it is generally not applied to the design of 
real-world protocols (e.g., TLS), which in turn is the main reason for the large 
number of protocol flaws discovered in the past. A major goal in cryptography 
must be to break the cycle of flaw discovery and fixes by providing solid proofs. 
Modularity appears to be the key in achieving this goal. 

1 Note that this work is orthogonal to the foundational problem of designing practical 
PKE schemes provably satisfying certain security notions, based on realistic hardness 
assumptions. The seminal CCA-secure PKE scheme based on the DDH-assumption 
by Cramer and Shoup | 9I10| falls into this category, as do, e.g., | 13I32I19I21I35 |. 
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In this spirit, we treat the use of PKE as such a construction step. The 
contributions of this paper are two-fold. First, we show how one can construct, 
using PKE, confidential channels from authenticated and insecure channels (cf. 
Section HH1 and Section [3]) ■ Second, we revisit several known game-based security 
notions (and variants thereof) and give them a constructive semantics, providing 
an explicit understanding of the application contexts for which a given notion is 
suitable (cf. Section fTT^l and Section^]). In Section fOl we describe how our results, 
although stated in a simpler setting, capture settings with multiple senders and 
the notion of corruption that exists in other frameworks, and in Section 11.41 we 
contrast the constructive paradigm with the approach of idealizing the properties 
of cryptographic schemes. Related work is discussed in Section [131 


1.1 Constructing Confidential Channels Using PKE 

From the perspective of constructive cryptography [24125] . the purpose of a 
public-key encryption scheme is to construct a confidential channel from non- 
confidential channels. Here, a channel is a resource (or functionality) that in- 
volves a sender, a receiver, and — to model channels with different levels of 
security — an attacker. A channel generally allows the sender to transmit a mes- 
sage to the receiver; the security properties of a particular channel are captured 
by the capabilities available to the attacker, which might, e.g., include reading 
or modifying the messages in transmission. 

The parties access the channel through interfaces that the channel provides 
and that are specific for each party. For example, the sender’s interface allows 
to input messages, and the receiver’s interface allows to receive them. We refer 
to the interfaces by labels A, B, and E, where A and B are the sender’s and 
the receiver’s interfaces, respectively, and E is the adversary’s interface. In this 
work, we consider the following four types of channels (from A to B- channels in 
the opposite direction are defined analogously), using the notation from 

— An insecure channel, denoted », allows the adversary to read, deliver, 

and to delete all messages input at A, as well as to inject its own messages. 

- An authenticated channel, denoted •-<>-»■, still allows to read all messages, 
but the adversary is limited to forwarding or deleting messages input at A. 

— A confidential channel, denoted — o— »•, only leaks the length of the messages 
but does not necessarily prevent injections. 

- A secure channel, denoted also only leaks the message length, and 

only allows the adversary to forward or delete messages input at A. 

To use public-key encryption, the receiver initially generates a key pair and 
transmits the public key to the sender. The sender needs to obtain the correct 
public key, which corresponds to assuming that the channel from B to A is 

2 The in the notation signifies that the capabilities at the marked interface, i.e., 
sending or receiving, are exclusive to the respective party. If the is missing, the 
adversary also has these capabilities. The o-symbol is explained in Section 12.41 
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authenticated (< — 41)- To transmit a message confidentially, the sender then 
encrypts the message under the received public key and sends the ciphertext to 
the receiver over a channel that could be authenticated or completely insecure. 

The exact type of channel that is constructed depends on the type of assumed 
channel used to transmit the ciphertext to the receiver: We show that if the 
assumed channel is authenticated (*-<>-») and the PKE scheme is ind-cpa-secure, 
the constructed channel is a secure channel (•—<>-»•). If the assumed channel is 

insecure ( ») and the PKE scheme is ind-cca-secure, the constructed channel is 

only confidential (-<>-»•). Using the above notation, for protocols 7r and it' based 
on ind-cpa and ind-cca encryption schemes, respectively, these constructions can 
be written as 

[-< — • ,•—<>-»•] l ■ ■ > and [■< — •, »] l ■ ■ > —<>-»•, 

where the bracket notation means that both resources in the brackets are available. 

The notion of constructing the confidential (or secure) channel from the two 
assumed non-confidential ones is made precise in a simulation-based sense mu, 
where the simulator can be interpreted as translating all attacks on the protocol 
into attacks on the constructed (ideal) channel. As the constructed channel is 
secure by definition, there are no attacks on the protocol. 

The composability of the construction notion then means that the constructed 
channel can again be used as an assumed resource (possibly along with additional 
assumed or constructed resources) in other protocols. For instance, if a higher- 
level protocol uses the confidential channel to transmit a message together with a 
shared secret value in order to achieve an additionally authenticated (and hence 
fully secure) transmission of the message, then the proof of this protocol is based on 
the “idealized” confidential channel and does not (need to) include a reduction to 
the security of the encryption scheme. In the same spirit, the authenticated chan- 
nel from B to A could be a physically authenticated channel, but it could also be 
constructed by using, for instance, a digital signature scheme to authenticate the 
transmission of the public key (which is done by certificates in practice). 

1.2 Constructive Semantics of Game-Based Security Notions 

Security properties for PKE are often formalized via a game between a hypo- 
thetical challenger and an attacker. We assign constructive semantics to several 
existing game-based definitions by first characterizing the appropriate assumed 
and constructed resources and then showing that the “standard use” of a PKE 
scheme over those channels (as illustrated in Section II. 1[) achieves the construc- 
tion if (and sometimes only if) it has the considered property^ 

In particular, we show that ind-cpa-security is not only sufficient but also nec- 
essary for constructing a secure channel from two authenticated channels. For 

3 The simple arrow indicates that < — • is a single-use channel, i.e., only one message 
can be transmitted. 

4 We point out that our negative results do not rule out the existence of other protocols 
that are derived from the scheme in some possibly more complicated way; those could 
still achieve the respective construction. 
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the construction of a confidential channel from an authenticated and an inse- 
cure channel, it turns out that ind-cca-security, while sufficient, is unnecessarily 
strong. The transformation only requires the weaker notion of ind-rcca-security, 
which was introduced by Canetti et al. [5] to avoid the artificial strictness of 
ind-cca. We continue the analysis of ind-cca-security and follow up on work by 
Bellare et al. [1], where several non-equivalent definitional variants are consid- 
ered. We show that only the stricter notions they consider are sufficient for the 
channel construction, leaving the exact semantics of the weaker notions unclear. 

We also consider non-adaptive GGA-security (ind-ccal) and non-malleability 
(nm-cpa). We show that both notions correspond to transformations between 
somewhat artificial channels, but might still be useful for specific applications. 


1.3 Capturing Settings with Potentially Corrupted Senders 

Although our security definitions for public-key encryption are phrased in a set- 
ting where there is only one legitimate sender (at the A- interface) , our treatment 
can be “lifted” to a setting with multiple senders generically, cf . [21] • In a sce- 
nario with multiple senders, it is important to formulate the guarantees that are 
maintained if one or more of the senders deviate from the protocol because their 
machines are controlled by some attacker (or virus). This is captured in most 
security frameworks by considering an external adversary that has the capability 
of corrupting some of the parties. In the context of PKE and secure communi- 
cation, the goal is to still provide confidentiality guarantees to non-corrupted 
senders. (If the receiver is corrupted, then no security can be guaranteed.) 

The ability of an attacker to act on behalf of corrupted senders means that it 
can directly send (bogus) ciphertexts to the receiver, even if the communication 
to the receiver is authenticated. This capability corresponds exactly to the case 
of assuming only an unauthenticated channel, where the messages are injected 
via the E-interface. Hence, our treatment extends to the case of (static) sender 
corruption by considering the lifting that relates the interfaces of the parties 
in the multi-party scenario to the A-interface in the three-party setting, and 
provides all capabilities of the statically corrupted parties also at the E-interface. 

In summary, the security of public-key encryption in the presence of poten- 
tially (statically) corrupted senders corresponds exactly to the construction of 
a confidential channel -o-s»* from one insecure channel » and one authen- 

ticated channel < — • in the opposite direction, as discussed in .Section 11.11 This 
implies that in the presence of (static) corruption, ind-rcca security is required 
and sufficient both in the case where the channel from the sender to the receiver 
is authenticated, and also where it is not authenticated. 

1.4 Idealizing Properties vs. Constructing Resources 

The security guarantees that one requires from a cryptographic scheme can be 
modeled in fundamentally different ways, even within a single formal security 
framework. One approach, which underlies the PKE functionality E PK e in 0, 
is to idealize the properties of the algorithms that comprise the scheme. Such a 
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functionality corresponds to a cryptographic scheme, and its interfaces closely 
resemble the interfaces of the algorithms (although, e.g., the private key is never 
output by J’pke)- In such a treatment, elements that are essential for using the 
scheme, such as the ciphertext or the public key, will still appear in the func- 
tionality, but they are idealized in that, e.g., the ciphertext is independent of 
the corresponding plaintext; the idealized scheme is unbreakable by definition. 

Another — fundamentally different — approach is to explicitly model resources 
that are available to one or more parties. The communication channels we 
describe in Section HTT1 can be considered network resources ; there are also func- 
tionalities in the UC framework, such as F A uth or F so in 0, that can be inter- 
preted in this way. More generally, one can also think of randomness, memory, or 
even computation as resources of this type. Following the constructive paradigm, 
the guarantees of a cryptographic scheme are not a resource, but modeled as 
the guarantee that the scheme transforms one (assumed) resource into another 
(constructed) resource @ Compared to ideal functionalities of the above type, 
the description of resources tends to be simpler and easier to understand. For 
example, in the case of public-key encryption, the confidential channel does not 
need to specify implementation artifacts such as ciphertexts or public keys. 

While both approaches allow to divide the security proof of a composite pro- 
tocol into several steps that can be proven independently, only the second ap- 
proach enables a fully modular protocol design. Each sub-protocol achieves a 
well-defined construction step transforming a resource R into a resource S, which 
abstracts from how S is achieved. A higher-level protocol can thus use such a 
resource S independently of how it is obtained, and the construction of S can 
be replaced with a different one without affecting the design or proof of the 
higher-level protocol. Concretely, a protocol using the resource -<>-»• does not 
depend on whether or not the channel is constructed by a PKE scheme, whereas 
a protocol using the functionality F PKE will always be specific to this step. 


1.5 Related Work 

We provide here an abridged comparison with related work. A more comprehen- 
sive comparison can be found in the full version of this work. 

Game-based security. The study of PKE security was initiated by Goldwasser 
and Micali na, who introduced the notions of indistinguishability and seman- 
tic security. Yao’s J2S] definition, based on computational entropy, was shown 
equivalent to variants of El by Micali et al. [30] . Goldreich [14115] made impor- 
tant modifications and also dealt with uniform adversaries. Today’s widely-used 

5 By contrast, a typical UC security statement is that a cryptographic scheme imple- 
ments some functionality. While statements about hybrid protocols in UC appear 
similar to constructive statements, they are less expressive since, e.g., the UC frame- 
work technically does not allow to make statements about assuming only bounded 
resources, as protocols that use hybrid functionalities can always instantiate arbi- 
trarily many functionalities of a given type. 
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variant, indistinguishability under chosen-plaintext attack or ind-cpa, has been 
strengthened by considering more powerful attackers that can additionally ob- 
tain decryptions of arbitrary ciphertexts. This lead to the notions of ind-ccal and 
ind-cca2 (e.g., |31l37p . Different variants of ind-cca2-security were compared by 
Bellare et al. [3]. Canetti et al. [8] introduced the weaker notion ind-rcca that 
suffices for many applications. A second important security property is non- 
malleability, introduced by Dolev et al. [12] • Informally, it requires that an ad- 
versary cannot change a ciphertext into one that decrypts to a related message. 
Variations of this notion have been considered in subsequent work m- 

Real-world/ideal-world security. The idea of defining protocol security with re- 
spect to an ideal execution was first proposed by Goldreich et al. [IS] ; the concept 
of a simulator can be traced back to the seminal work by Goldwasser et al. [18] on 
zero-knowledge proofs. General security frameworks that allow the formalization 
of arbitrary functionalities to be realized by cryptographic protocols have been 
introduced by Canetti [5] as universal composability (UC) as well as by Backes 
et al. [3311] as reactive simulat ability (RSIM) . Treatments of PKE exist in both 
frameworks. The treatment in UC is with respect to an “ideal PKE” functional- 
ity; realizing this functionality is equivalent to ind-cca2-security [8]. Canetti and 
Krawczyk [7] formulate UC functionalities that model different types of commu- 
nication channels and can be interpreted as network resources; they do not treat 
public-key encryption from this perspective. The formalization of the functional- 
ities in m is closer to our approach, but less modular and hence formally more 
complex. In particular, the treatment is restricted to the case where the authen- 
ticated transmission of the ciphertexts is achieved by digital signatures instead 
of using a generic composition statement. More generally, both frameworks [5] 
and [33] are designed from a bottom-up perspective (starting from a selected 
machine model), whereas we follow the top-down approach of [35], which leads 
to simpler, more abstract definitions and statements. 

Maurer et al. [33] described symmetric encryption following the constructive 
cryptography paradigm as the construction of confidential channels from non- 
confidential channels and shared keys, and compared the security definitions 
they obtained with game-based definitions. The goal of this work is to provide a 
comparable treatment for the case of PKE. In the same spirit, specific anonymity- 
related properties of PKE have been discussed by Kohlweiss et al. m- 

2 Preliminaries 

2.1 Systems: Resources, Converters, Distinguishers, and Reductions 

At the highest level of abstraction (following the hierarchy in [35] ) , systems are 
objects with interfaces by which they connect to (interfaces of) other systems; 
each interface is labeled with an element of a label set and connects to only a 
single other interface. This concept of abstract systems captures the topological 
structures that result when multiple systems are connected in this manner. 
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The abstract systems concept, however, does not model the behavior of sys- 
tems, i.e., how the systems interact via their interfaces. Consequently, statements 
about cryptographic protocols are statements at the next (lower) abstraction 
level. In this work, we describe all systems in terms of (probabilistic) discrete 
systems, which we explain in Section ET21 

Resources and converters. Resources in this work are systems with three inter- 
faces labeled by A, B, and E. A protocol is modeled as a pair of two so-called 
converters (one for each honest party), which are directed in that they have an 
inside and an outside interface, denoted by in and out, respectively. As a nota- 
tional convention, we generally use upper-case, bold-face letters (e.g., R, S) or 
channel symbols (e.g., •-©-»•) to denote resources and lower-case Greek letters 
(e.g., a, /3) or sans-serif fonts (e.g., enc, dec) for converters. We denote by <I> the 
set of all resources and by E the set of all converters. 

The topology of a composite system is described using a term algebra, where 
each expression starts from one (or more) resources on the right-hand side and is 
subsequently extended with further terms on the left-hand side. An expression 
is interpreted in the way that all interfaces of the system it describes can be 
connected to interfaces of systems which are appended on the left. For instance, 
for a single resource Rs^, all its interfaces A. B. and E are accessible. 

For I G {A, B. E}, a resource R?$, and a converter a G E. the expression 
o/R denotes the composite system obtained by connecting the inside interface 
of a to interface I of R; the outside interface of a becomes the /-interface of the 
composite system. The system a 1 R is again a resource (cf. Figure[l]on oage l 14711 . 

For two resources R and S, [R, S] denotes the parallel composition of R and 
S. For each I G { A, B, E}, the /-interfaces of R and S are merged and become 
the sub-interfaces of the /-interface of [R, S], which we denote by 1 . 1 and 1.2. 
A converter a that connects to the /-interface of [R, S] has two inside sub- 
interfaces, denoted by in.l and in.2, where the first one connects to 1. 1 of R 
and the second one connects to 1.2 of S. 

Any two converters a and 3 can be composed sequentially by connecting the 
inside interface of 3 to the outside interface of a , written 3 o a. with the effect 
that (3 o a) 7 R = ft 1 a 1 R. Moreover, converters can also be taken in parallel, 
denoted by [ar,/3], with the effect that [a, /3] J [R, S] = [a J R, 3 I S]. 

We assume the existence of an identity converter id € E with id ; R = R for all 
resources R G # and interfaces I G { A , B. E} and of a special converter 1 G E 
with an inactive outside interface. 

Distinguishers. A distinguisher is a special type of system D that connects to all 
interfaces of a resource U and outputs a single bit at the end of its interaction 
with U. In the term algebra, this appears as the expression DU, which defines 
a binary random variable. The distinguishing advantage of a distinguisher D on 
two systems U and V is defined as 

Z\ D (U, V) := |P[DU = 1] - P[DV = 1]| 
and as Z\ I ’(U, V) := sup De2 ? A D (U, V) for a distinguisher class V. 
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The distinguishing advantage measures how much the output distribution of 
D differs when it is connected to either U or V. There is an equivalence notion 
on systems (which is defined on the discrete systems level), denoted by U = V, 
which implies that Z\ D (U,V) = 0 for all distinguishers D. The distinguish- 
ing advantage satisfies the triangle inequality, i.e., A D (U, W) < Z\ D (U,V) + 
Z\ D (V, W) for all resources U, V, and W and distinguishers D. 

Games. We capture games defining security properties as distinguishing prob- 
lems in which an adversary A tries to distinguish between two game systems G 0 
and Gj. Game systems (or simply games) are single-interface systems, which 
appear, similarly to resources, on the right-hand side of the expressions in the 
term algebra. The adversary is a distinguisher that connects to a game (instead 
of a resource). We denote by A the class of all adversaries for games. 

Reductions. When relating two distinguishing problems, it is convenient to use a 
special type of system C that translates one setting into the other. Formally, C 
is a converter that has an inside and an outside interface. When it is connected 
to a system S, which is denoted by CS, the inside interface of C connects to the 
(merged) interface(s) of S and the outside interface of C is the interface of the 
composed system. C is called a reduction system (or simply reduction). 

To reduce distinguishing two systems S, T to distinguishing two systems U, V, 
one exhibits a reduction C such that CS = U and CT = V0 Then, for all 
distinguishers D, we have A D (U,V) = A D (CS,CT) = A DC (S,T). The last 
equality follows from the fact that C can also be thought of as being part of the 
distinguisher. 


2.2 Discrete Systems 

Protocols that communicate by passing messages and the respective resources are 
described as (probabilistic) discrete systems. Their behavior can be formalized by 
random systems as in 123], i.e., as families of conditional probability distributions 
of the outputs (as random variables) given all previous inputs and outputs of the 
system. For systems with multiple interfaces, the interface to which an input or 
output is associated is specified as part of the input or output. For the restricted 
(but here sufficient) class of systems that for each input provide (at most) one 
output, an execution of a collection of systems is defined as the consecutive 
evaluation of the respective random systems (similarly to the models in |6I20| 1. 


2.3 The Notion of Construction 

Recall that we consider resources with interfaces A, B, and E, where A and 
B are interfaces of honest parties and E is the interface of the adversary. We 

6 For instance, we consider reductions from distinguishing game systems to distinguish- 
ing resources. Then, C connects to a game on the inside and provides interfaces A, 
B, and E on the outside. 
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formalize the security of protocols via the following notion of construction, which 
was introduced in [21] (and is a special case of the abstraction notion from pZH]l: 

Definition 1 . Let $ and E be as in Section HOI A protocol ir = (771,772) € E 2 
constructs resource S S $ from resource Re<f within e and with respect to 
distinguisher class V, denoted 


j A D ( 77^77^ -L^R, _L E S) < £ ( availability ) 

| 3 er € E : A v (Trfx^IL, a E S) < e (security). 

The availability condition captures that a protocol must correctly implement 
the functionality of the constructed resource in the absence of the adversary. The 
security condition models the requirement that everything the adversary can 
achieve in the real-world system (i.e., the assumed resource with the protocol) 
he can also accomplish in the ideal-world system (i.e., the constructed resource 
with the simulator). 

An important property of Definition [T] is its composability. Intuitively, if a 
resource S is used in the construction of a larger system, then the composability 
implies that S can be replaced by a construction 77 ^ 77 ^ R without affecting the 
security of the composed system. Security and availability are preserved under 
composition. More formally, if for some resources R, S, and T and protocols 7r 

and 4 


O, 


[R, U] !==}► [S,U] and [U,R] J 

for any resource U. More details can be found in [21] . 


[U,S] 


2.4 Channels 

We consider the types of channels 
shown on the right. Each channel 
initially expects a special cheat- 
ing bit b G {0, 1} at interface E, 
indicating whether the adversary 
is present and intends to interfere 
with the transmission of the messages. The special converter T (cf. Section I2TT1) 
always sets 6 = 0. For simplicity, we will assume that whenever T is not present, 
all cheating bits are set to 1. 


Channel Name 

Symbol 

i(m) inj 

Insecure Channel 

» 

TO / 

Confidential Channel 

— 0 — 

M ^ 

Authenticated Channel 


m x 

Secure Channel 


|m| x 
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A channel from A to B with leakage l and message space M. C {0,1}* is 
a resource with interfaces A, B, and E and behaves as follows 0 When the i th 
message m £ M. is input at interface A, it is recorded as (i,m) and ( i,£(m )) 
is output at interface E. When (dlv, i') is input at interface E, if {%' ,m') has 
been recorded, m! is delivered at interface B. If injections are permissible, when 
(inj,ra') is input at interface E, m! is output at interface 

The security statements in this work are parameterized by the number of 
messages that are transmitted over the channels. More precisely, for each of the 
above channels and each n £ N, we define the n-bounded channel as the one 
that processes (only) the first n queries at the A-interface and the first n queries 
at the E-interface (as described above) and ignores all further queries at these 
interfaces. We then require from a protocol that it constructs, for all n £ N, 
the n-bounded “ideal” channel from the n-bounded assumed channel. Wherever 
the number n is significant, such as in the theorem statements, we denote the 
n-bounded versions of channels by writing the n on top of the channel symbol 
(e.g., -^>-»*); we omit it in places that are of less formal nature. 

Finally, a simple-arrow symbol (e.g., • — >) denotes a single-use channel. That 
is, only one message may be transmitted. 


2.5 Public-Key Encryption Schemes 

A public-key encryption (PKE) scheme with message space M. C {0, 1}* and 
ciphertext space C is defined as three algorithms IT = (K, E, D), where the key- 
generation algorithm K outputs a key pair (pk, sk), the (probabilistic) encryption 
algorithm E takes a message m £ M. and a public key pk and outputs a cipher- 
text c <— -E’pk(m), and the decryption algorithm takes a ciphertext c £ C and a 
secret key sk and outputs a plaintext m <- D s k(c). The output of the decryption 
algorithm can be the special symbol o, indicating an invalid ciphertext. 

A PKE scheme is correct if m = D sk (E pk (m)) (with probability 1 over the 
randomness in the encryption algorithm) for all messages m and all key pairs 
(pk, sk) generated by K. 

It will be more convenient to phrase bit-guessing games used in definitions of 
PKE security properties as a distinguishing problem between two game systems 
(cf. Section 12.11) . We consider the following games, which correspond to the 
(standard) notions of ind-cpa (cpa for short), ind-cca2 (cca), ind-ccal (ccal), 
ind-rcca (rcca), and nm-cpa (nm)0 Informally, a scheme is secure in the sense of 
a notion if efficient adversaries have negligible advantage in distinguishing the 
two corresponding game systems. 

7 If the cheating bit is set to b = 0, all messages input at the sender interface A are 
immediately delivered to B. 

8 Note that none of the channels prevents the adversary from reordering or replaying 
messages sent over the channel. The o-symbol suggests the “internal buffer” in which 
the channel stores messages input at A. 

9 We consider the so-called real-or-random versions of these games, which are equiv- 
alent to the more popular left-or-right formulations (as shown in [5] for symmetric 
encryption) . For non-malleability, we use an indistinguishability-based version by [5] . 
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CPA game. Consider the systems Gq P 3 and G} pa defined as follows: For a PKE 
scheme 77, both initially run the key-generation algorithm to obtain (pk,sk) 
and output pk. Upon (the first) query (chall,m), G{ pa outputs an encryption 
c 4— 7?pk(m) of m and G', pa an encryption c 4— 7? p k(m), called the challenge, of 
a randomly chosen message to of length |m|. 

CCA games. For b G {0, 1}, system G£ cal proceeds as G £ pa but additionally 
answers decryption queries (dec, c!) before the challenge is output by returning 
m! 4— Dsk(c'). G“ a answers decryption queries at any time unless d equals the 
challenge c (if defined), in which case the answer is test. 

RCCA game. Consider the systems Gq* 3 and G^ cca defined as follows: Initially, 
both run the key-generation algorithm to obtain (pk, sk) and output pk. Upon 
(the first) query (chall,m), both choose a random message to of length |m|. 
Gg cca outputs c 4— U p k (m) and Gf 03 outputs c 4— U p k(m). Both systems answer 
decryption queries (dec, d), but if 7? s k(c') G {to, to} (if to and to are defined), 
the answer is test. 

For more details about RCCA-security, see Section fO or consult [5], where 
the notion was introduced. 

NM game. Consider the systems Go" 1 and G" m defined as follows: Both initially 
run the key-generation algorithm to obtain (pk, sk) and output pk. Upon (the 
first) query (chall,TO), G[] m outputs an encryption c 4— U p k (m) of to and GJ m 
an encryption c 4— 7? p k ( to) of a randomly chosen message to of length |m|. 
When a query (dec, c\, . . . , cp) is input, both systems decrypt ci, . . . , cp, return 
the resulting plaintexts (if any of the ciphertexts equal c, the corresponding 
plaintexts are replaced by test), and terminate the interaction. 

2.6 Asymptotics 

To allow for asymptotic security definitions, cryptographic protocols are often 
equipped with a so-called security parameter. We formulate all statements in this 
paper in a non-asymptotic fashion, but asymptotic statements can be obtained 
by treating systems S as asymptotic families (S re } ree N and letting the distin- 
guishing advantage be a real-valued function of n. Then, for a given notion of 
efficiency, one can consider security w.r.t. classes of efficient distinguishers and a 
suitable negligibility notion. All reductions in this work are efficient with respect 
to the standard polynomial-time notions. 

3 Constructing Confidential Channels with PKE 

The main purpose of public-key encryption (PKE) is to achieve confidential com- 
munication. As a constructive statement, this means that we view a PKE scheme 
77 as a protocol, a pair of converters (enc, dec), whose goal is to construct a con- 
fidential channel from non-confidential channels. Differentiating between the two 
cases where the communication from the sender to the receiver is authenticated 
and unauthenticated, this is written as 
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(enc,dec) (enc,dec) 

[■< — • ,*-<>-»] l ■ ■ > (1) and [« — •, »] l ■ ■ > —o-vf, (2) 

respectively. 

In both cases, the single-use channel < — • captures the ability of the sender to 
obtain the receiver’s public key in an authenticated fashion. In construction (P), 
the communication from the sender A to the receiver B is authenticated, which 
is modeled by the channel The goal is to achieve a secure channel 

which only leaks the length of the messages sent at interface A. In construc- 
tion ([2]), the communication from A to B is completely insecure, which is cap- 
tured by the insecure channel ». Here, the goal is to achieve a confidential 

channel -<>-»•, which still hides messages input at the A- interface but also allows 
to inject arbitrary messages at E. 

In the following, we first show how a PKE scheme IT can be seen as a converter 
pair (enc, dec). We then prove that (enc, dec) achieves construction ([TJ) if the un- 
derlying PKE scheme is cpa-secure, and construction (|2]) if the underlying PKE 
scheme is cca-secure. We also briefly discuss the usefulness of the constructed 
channels. 


3.1 PKE Schemes as Protocols 

Let IT = (K, E, D) be a PKE scheme. Based on IT, we define a pair of protocol 
converters (enc, dec) for constructions dTJ) and P). Both converters have two sub- 
interfaces in.l and in. 2 on the inside, as we connect them to a resource that is 
a parallel composition of two other resources (cf. Section 12. 1 1) . 

Converter enc works as follows: It initially expects a public key pk at in.l. 
When a message m is input at the outside interface out, enc outputs c <— E p ^ (to) 
at in. 2. Converter dec initially generates a key pair (pk, sk) using key-generation 
algorithm K and outputs pk at in.l. When dec receives d at in.2, it computes 
m! <- T) s k(c') and, if m! ^ o, outputs m! at the outside interface out. 


3.2 Constructing a Secure from Two Authenticated Channels 

Towards proving that the protocol (enc, dec) indeed achieves construction ([]]), 
note first that the correctness of IT implies that the availability condition of 
Definition Q] is satisfied. To prove security, we need to exhibit a simulator cr 
such that the assumed resource [« — •, •—©-»■] with the protocol converters is 
indistinguishable from the constructed resource •-<>-»• with the simulator (cf. 
Figured]). 

Theorem [T] implies that (enc, dec) realizes © if the underlying PKE scheme 
is cpa-secure. 

Theorem 1. There exists a simulator a and for any n £ N there exists a (effi- 
cient) reduction C such that for every D, 


Z\ D (enc A dec s [« — •, *-<>-»■], a E •-<>-»•) < ft - Z\ DC (Go pa , G ppa ). 
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Fig. 1 . Left: The assumed resource (two authenticated channels) with protocol con- 
verters enc and dec attached to interfaces A and B, denoted enc A dec s [< — •, • — o — »•] . 
Right: The constructed resource (a secure channel) with simulator a attached to the 
E-interface, denoted In particular, a must simulate the E-interfaces of the 

two authenticated channels. The protocol is secure if the two systems are indistinguish- 


Proof. First, consider the following simulator a for interface E of which 

has two sub-interfaces, denoted by out.l and out. 2, on the outside (since the 
real-world system has two sub- interfaces at E ) : Initially, a generates a key pair 
(pk, sk) and outputs (1, pk) at out.l. When it receives ( i , l) at the inside interface 
in, a generates an encryption c <— E p k(fh) of a randomly chosen message m of 
length l and outputs (i, c) at out. 2. When (dlv, i') is input at out. 2, a simply 
outputs (dlv, i') at in. 

Consider the two systems U := enc A dec s [-< — •, •—<>-»■] and V := <j e 
Distinguishing Gy pa from G^ pa can be reduced to distinguishing these two sys- 
tems via the following reduction system C . which connects to a game on the 
inside and provides interfaces A, B, and E on the outside (cf. Section I2T1 for 
details on reduction systems): Initially, C' takes a value pk from the game (on 
the inside) and outputs (1, pk) at the (outside) E. 1-interface. When a message 
m is input at the A-interface of C', it is passed as (chall,m) to the game. The 
resulting challenge c is output as (1, c) at the E. 2-interface. When (dlv, 1) is 
input at the E. 2-interface, C' outputs m at interface B. 

We have C'G„ pa = U and C'G) pa = V, and thus 

A d (enc A dec B [« — • , •-<>-»■] , a E *-<>-&•) < n ■ A DC (U, V) 

= n ■ A oc " (C'Gq 33 , C'G ppa ) 

= n ■ zi DC (Go pa , G ppa ), 

where C := C"G and the first inequality follows from a standard hybrid argu- 
ment for a reduction system C" (deferred to the full version). □ 


3.3 Confidential Channels from Authenticated and Insecure Ones 

To prove that the protocol (enc, dec) achieves construction (HJ), we need to again 
exhibit a simulator a such that the assumed resource [« — •, »] with the pro- 
tocol converters is indistinguishable from the constructed resource — ©-#• with 
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the simulator, as done in Theorem P which implies that (enc, dec) realizes © if 
the underlying PKE scheme is cca-secure. We defer the proof to the full version. 

Theorem 2. There exists a simulator a and for any n G N there exists a (effi- 
cient) reduction C such that for every D, 

A D (enc A dec s [•< — • , »],<J E -<>-»•) < n ■ A DC (Go ca , G“ a ). 

The confidential channel —<>-#• is the best channel one can construct from 
the two assumed channels. As the E-interface has the same capabilities as the 
A-interface at both the authenticated (from B to A ) and the insecure channels, 
it will necessarily also be possible to inject messages to the receiver via the 
E-interface by simply applying the sender’s protocol converter. 


3.4 Applicability of the Constructed Channels 

The plain use of PKE yields constructions (P) and ©, i.e., one obtains the 
resources •—<>-»• and —<>-#•. Both channels allow the adversary to reorder or 
replace the messages sent by A. In practice, where PKE is often used to en- 
capsulate symmetric keys, it is important, however, that keys used in various 
protocols by different users are independent. Thus, it is more useful to obtain 
independent single-use channels [• — >•, . . . , • — >•] and [ — >•, . . . , — >•] instead of 
and -o-s»*, respectively. 

In the authenticated setting, given independent authenticated channels, pro- 
tocol (enc, dec) (with only formal modifications) achieves the construction 


In the unauthenticated setting, however, the analogous construction 


is not achieved by (enc, dec) since, due to the absence of authenticity, the ad- 
versary can freely take a ciphertext it observes on one of the insecure channels 
— > and insert it into another one. Thus, the ideal resource cannot consist of 
independent channels. This issue can be taken care of by (explicitly) introducing 
session identifiers (SIDs). A systematic treatment of handling multiple sessions 
and senders can be found in [25] . 


4 Constructive Semantics of Game-Based Notions 

We analyze several game-based security notions from a constructive viewpoint. 
We complete the analysis of cpa-security from Section 13.21 by showing that it is 
also necessary to achieve construction |(TJ) . Moreover, we explain why the notion 
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of cca is unnecessarily strict for construction (J2) and that the construction in 
fact only requires the weaker notion of rcca introduced in [5]. 

Then, we follow up on work by Bellare et al. [4] , who compared several variants 
of defining cca-security, and showed that only the stricter notions they consider 
are sufficient for construction ©. We also provide constructive semantics for 
non-adaptive chosen-ciphertext security and non-malleability. 


4.1 CPA Security Is Necessary for Construction (pQ) 

We prove in Section HOI that indistinguishability under chosen-plaintext attacks, 
cpa-security, suffices to construct a secure channel from two authenticated chan- 
nels. Here, we show that it is also necessary. That is, if protocol (enc, dec), based 
on a PKE scheme 17 as shown in Section 13.11 achieves the construction, then 77 
must be cpa-secure. 

In the following, let 

U := enc A dec B [t — •, • — o — »-] and V := cr E »-o-^», 

where a is an arbitrary simulator. 

Theorem 3. There exist (efficient) reductions Co and Ci such that for all ad- 
versaries A, 


A a (Gq P 3 , G' pa ) < Z\ AC °(U, V) + A ACl (U, V). 

Proof. Consider the following reduction systems Co and Ci, both connecting to 
an { A , B , 7?}-resource on the inside and providing a single interface on the out- 
side (for the adversary): Initially, both obtain (1, pk) at the inside E. 1-interface 
and output pk at the outside interface. When (chall,m) is received on the out- 
side, Co outputs m at the inside A-interface and Ci a randomly chosen message 
to of length |to|. Subsequently, (l,c) is received at the inside E. 2-interface, and 
c is output (as the challenge) on the outside by both systems. We have 

C 0 U = G ppa and CiU = G ppa and C 0 V = CrV, 

where the last equivalence follows from the fact that, in V, the input from 
to a is the same in both systems (the length of the message input at the 
A-interface of •—<>-»•) , and therefore they behave identically. Hence, 

Z\ A (G ppa , Gj pa ) = A a (C 0 U, CiU) 

< A A (C 0 U, C 0 V) + A A (C 0 V, Ci V) + A A (CiV, CiU) 

= Z1 AC ° (U, V) + Zl ACl (U, V). 


□ 
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4.2 RCCA Security Is Necessary for Construction (|2l) 

Indistinguishability under chosen-ciphertext attacks, cca-security, suffices to con- 
struct a confidential channel from an authenticated and an insecure one (cf. Sec- 
tion [373]) . It is, however, unnecessarily strict, as can be seen from the following 
example, adapted from [5]: Let 77 be a PKE scheme and assume it is cca-secure. 
Consider a modified scheme II' that works exactly as 77, except that a 0-bit is 
appended to every encryption, which is ignored during decryption. It is easily 
seen that II' is not cca-secure, since the adversary can obtain a decryption of 
the challenge ciphertext by flipping its last bit and submitting the result to the 
decryption oracle. PKE scheme II' can, however, still be used to achieve con- 
struction ([5]) using a simulator that issues the d I v- instruction to ^>— whenever 
a recorded ciphertext is received at the outside interface or one where flipping 
the last bit results in a recorded ciphertext (cf. full version for more details). 

Canetti et al. [5] introduced the notion of replayable chosen ciphertext security, 
rcca, which is more permissive in that it allows the adversary to transform a 
ciphertext into one that decrypts to the same message. In the full version of 
this paper, we show that if protocol (enc,dec), based on a PKE scheme 77 (cf. 
Section 13.11) . achieves ©. then 77 must be rcca-secure, and that rcca is also 
sufficient for the construction if the message space of 77 is sufficiently large. 


4.3 Variants of Chosen-Ciphertext Security 

Bellare et al. [1] analyze several ways of enforcing the condition that the adver- 
sary must not query the challenge ciphertext c to the decryption oracle. They 
consider modifications along two axes: First, the condition can be enforced dur- 
ing the entire game (b for both phases) or only in the second phase (s for second 
phase), i.e., after the c has been given to the adversary. Second, one can ei- 
ther exclude adversaries with a non-zero probability of violating the condition 
from consideration (e for exclusion) or penalize an adversary (by declaring the 
game lost) whenever he asks the challenge c (p for penalty). The combination of 
these choices yields four non- equivalent notions ind-cca-sp, ind-cca-se, ind-cca-bp, 
ind-cca-be. The s-notions are equivalent to each other and to our formulation of 
cca-security (cf. Section 12.511 . The e- notions are strictly weaker and do in fact 
not even imply cca 1-security g]. Since cca 1-security is weaker than rcca-security 
and rcca is needed for construction @ , they are not sufficient for @ . 


4.4 Non-malleability 

Informally, a non- malleable PKE scheme is such that the adversary cannot trans- 
form a ciphertext into one that decrypts to a related message. We consider the 
notion of non-malleability under chosen-plaintext attacks, nm-cpa, and show that 
from a PKE scheme with this property we can build a protocol (enc", dec") that 
achieves the construction 


[<— ,-^H] 


(3) 
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where >H works like » but halts when halt is input at B and where the 

channel -o-#» is defined as follows: It internally keeps an initially empty list C 
of messages. When the i th message m is input at interface A, it is recorded as 
(i,m) and (*, |m|) is output at interface E. When (dlv, i') is input at interface E 
and if ( i',m ') has been recorded, m! is appended to C. When (inj,m') is input 
at interface E, ml is appended to C. When dlv-all is input at B, all messages in 
£ are output at B, and the channel halts. 

The protocol converters (enc", dec") are built as (enc,dec) in Section 13.11 
except that dec" only outputs the messages it received once dlv-all is input at 
the outside interface, at which time it also outputs halt at its inside interface 
and halts. In the full version of this paper, we prove that (enc", dec") achieves 
construction ([3]) if 17 is nm-cpa-secure. 

The assumed channel » could itself be constructed in a setting where A 

and B have synchronized clocks and B buffers all messages until an agreed point 
in time, when A also stops sending. By the composition theorem, the channel 
that is constructed in this manner can then serve as the assumed channel in 
construction ([3J to construct the channel — o— »» using PKE. This channel may 
then for instance be useful for running a protocol implementing a blind auction. 

4.5 Non-adaptive Chosen-Ciphertext Security 

ind-cca 1-security, is defined via a game G ccal , which works as G cca except that no 
decryption queries are answered once the adversary has been given the challenge 
ciphertext. The most natural way to translate this into a constructive statement 
is to consider the construction of a (type of) confidential channel o— where 
the adversary can inject messages at interface E only as long as no message has 
been input at A from an insecure channel o » with the same property. 

In the full version of this paper, we show that protocol (enc, dec) built from a 
ccal-secure PKE scheme II as in Section [3TT1 achieves 



Although this construction seems artificial, as with construction (J3]), it can be 
used in any setting where the assumed channel is an appropriate modeling of an 
available physical channel (or can itself be constructed from such a channel) . 

5 Conclusions 

The purpose of this paper is to present the basic ways of applying PKE (within a 
larger protocol) as constructive steps, to be used for the modular design of com- 
plex protocols, thus taming the complexity of security- protocol design. To be 
ultimately applicable to full-fledged real-world protocols, other relevant crypto- 
graphic primitives also need to be modeled in the same way. While for symmetric 
encryption and MACs this was explained in |28I26| , and for commitments in m , 
treating digital signatures and other cryptographic schemes and security mecha- 
nisms (sequence numbers, session identifiers, etc.) in constructive cryptography 
is left for future work (cf. P5]l. 
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Abstract. The equivalence of the random-oracle model and the ideal- 
cipher model has been studied in a long series of results. Holenstein, 
Kiinzler, and Tessaro (STOC, 2011) have recently completed the pic- 
ture positively, assuming that, roughly speaking, equivalence is indiffer- 
entiability from each other. However, under the stronger notion of reset 
indifferentiability this picture changes significantly, as Demay et al. (EU- 
ROCRYPT, 2013) and Luykx et al. (ePrint, 2012) demonstrate. 

We complement these latter works in several ways. First, we show 
that any simulator satisfying the reset indifferentiability notion must be 
stateless and pseudo deterministic. Using this characterization we show 
that, with respect to reset indifferentiability, two ideal models are either 
equivalent or incomparable, that is, a model cannot be strictly stronger 
than the other model. In the case of the random-oracle model and the 
ideal-cipher model, this implies that the two are incomparable. Finally, 
we examine weaker notions of reset indifferentiability that, while not 
being able to allow composition in general, allow composition for a large 
class of multi-stage games. Here we show that the seemingly much weaker 
notion of 1-reset indifferentiability proposed by Luykx et al. is equivalent 
to reset indifferentiability. Hence, the impossibility of coming up with a 
reset-indifferentiable construction transfers to the setting where only one 
reset is permitted, thereby re-opening the quest for an achievable and 
meaningful notion in between the two variants. 

1 Introduction 

Idealized Models. The standard approach to cryptographic security is to reduce 
the security of a scheme to a (hopefully) well-studied algebraic or combinatorial 
complexity assumption. Unfortunately, a large number of cryptographic schemes 
does not admit a security reduction in the standard model. In these cases, the 
community often resorts to an idealized model, where we can sometimes obtain 
a proof of security. It is, of course, highly controversial whether or not proofs 
in idealized models are acceptable, but there is a tendency to prefer an analysis 
in an idealized model over the utter absence of any proof at all — in particular, 
when one is concerned with schemes that are widely deployed in practice [5I6IU] . 

Arguably the most popular model of this kind is the random-oracle model 
(ROM) where all parties have oracle access to a public, randomly chosen func- 
tion [3] . Somewhat related is the ideal-cipher model (ICM) which gives all parties 
oracle access to a public, randomly chosen (keyed) blockcipher pH] , Knowing 
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that there is a close relation between pseudorandom functions and pseudoran- 
dom permutations — namely existential equivalence — one could suspect that the 
random-oracle model and the ideal-cipher model are equivalent, too. However, 
formalizing the notion of equivalence is delicate and so are the proofs. 

Equivalence of the ROM and ICM under Indifferentiability. Maurer, Renner and 
Holenstein pjj] introduced the concept of indifferentiability, which since then has 
been regarded as the prevalent and actually only notion of equivalence between 
ideal primitives. A construction G n with access to some primitive 7r is called 
indifferentiable from another ideal primitive II, if there is a simulator S such 
that the construction G n implements an oracle that is indistinguishable from 
II, even if the distinguisher V additionally gets access to n. Now, demanding 
the distinguisher V to distinguish (G n ,n) from U is of little sense. Additionally 
to the oracle II, the distinguisher gets access to the simulator S which tries 
to emulate 7r’s behavior consistently with II. Thus, the distinguisher tries to 
distinguish the pair of oracles (G n ,7r) from the pair of oracles (II, S n ). 

In the case of the ideal-cipher model and the random-oracle model, consider- 
able effort has led to a proof of equivalence [11112117] under indifferentiability. 
The reason why indifferentiability was considered a suitable notion of equiva- 
lence is the appealing composition theorem established by Maurer et al. m- 
Namely, they transform any reductionist argument in the presence of the ideal 
primitive II into a proof that relies on the existence of 7r only. Their theorem, 
thus, transforms a reduction IZ into a reduction TV , where the latter locally im- 
plements a single copy of the simulator S. Jumping ahead, it will turn out that 
in this step, they rely on an implicit assumption. 

Multi-Stage Adversaries. Ristenpart et al. [20] were the first to point out sce- 
narios where indifferentiability of G n from II was not sufficient to replace II 
by G n . Their counterexamples involve adversaries that run in multiple stages, 
i.e., an adversary A consists of two or more sub-adversaries, say A = (Mi, M 2 ), 
that do not share state (or at least not arbitrary state). Now, a reduction IZ 
that reduces to such a multi-stage game also needs to be split into two parts 
(IZi,IZf) where the same restriction upon the sharing of state applies. Hence, 
for the composition theorem by Maurer et al., each part of the reduction 1Z\ and 
IZ 2 needs to implement its own, independent copy of the simulator S. However, 
in this case, the two copies of the simulator will not necessarily behave in the 
same way as opposed to the “real” primitive 7T which is, roughly, what makes 
the composition theorem collapse in the setting of multi-stage games. 

Curiously, their composition holds in the presence of strong, colluding adver- 
saries, while it does not in the setting of weaker, non-colluding ones. Usually in 
cryptography, a conservative approach corresponds to considering the strongest 
possible adversary, as a primitive that is secure against a strong adversary is also 
secure against a weaker adversary. However, the indifferentiability composition 
theorem is not, by itself, a security model or a proof of security. Instead, it is 
a tool to transform any proof in a security model in the presence of one ideal 
primitive into a security proof in the same security model in the presence of 
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another ideal primitive. Hence, one tries to cover any type of security model, 
which, in particular, includes security models where stage-sharing adversaries 
can mount trivial attacks. And thus, a conservative approach in the setting of 
indifferentiability demands including also weaker, namely non-colluding state- 
sharing adversaries. Technically, the composition theorem is harder to prove for 
weaker adversaries, because it transforms an adversary of one type into another 
adversary of the same type. Considering a stronger adversary corresponds to a 
stronger assumption in the theorem, but also to a harder statement to prove, 
and vice versa for weaker adversaries. 

One might hope that the distinction is of technical interest only. Unfortu- 
nately, as we argue, in basically all real-life scenarios, we need to consider multi- 
stage adversaries. Ristenpart et al. give several examples of multi-stage games 
for notions such as deterministic encryption m, key-dependent message secu- 
rity [8], related-key attacks [3], and non-malleable hash functions [10]. On the 
other hand, many classical notions of security seem inherently single stage: IND- 
CPA or IND-CCA security for encryption, or signature schemes which are ex- 
istentially unforgeable under (adaptive) chosen message attacks. However, any 
classical definition of security becomes multi staged if it is augmented with a 
leakage oracle. The reason is that, in the random oracle model, every party 
should have access to the random oracle. In particular, this includes the leakage 
oracle and the adversarially specified leakage function, resulting in an implicit 
second stage M- Hence, whenever side-channel attacks are reflected in a model, 
adversaries act at least in two stages — and for real-life applications, we cannot 
discard side-channel attacks. 

In order to cope with the new challenge of multi-stage adversaries, Ristenpart 
et al. put forward a strengthened notion called reset indifferentiability. Roughly 
speaking, in this game, the distinguisher may reset the simulator’s internal state 
between any two queries. Returning to ROM/ICM equivalence, an inspection of 
the simulators defined in m and [IT] (as well as jTTl . for that matter) reveals 
that their behavior varies substantially with their state and, thus, they are not 
reset indifferentiable. 

Equivalence of the ROM and ICM under Reset Indifferentiability. As plain in- 
differentiability is not sufficient to argue that two primitives are equivalent, the 
question regarding the ideal cipher model and the random oracle model is, thus, 
again open. Building on first negative results from [2D], the authors of [13118] 
have recently shown that reset-indifferentiable constructions cannot be built via 
domain extension, thereby ruling out constructions from ideal ciphers that are 
reset indifferentiable from a random oracle; note that random oracles are usually 
perceived as having an infinite domain while ideal ciphers have a finite domain. 
With this result at hand, we thus know that ideal ciphers cannot be used to 
obtain random oracles via a reset-indifferentiable construction, but it might still 
be possible to construct an ideal cipher from a random oracle, i.e., either the 
two models are entirely incomparable, or the random-oracle model is strictly 
stronger. 
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We rule out such a possibility. Our so-called duality lemma establishes that if 
there is no construction GJ that is reset indifferentiable from primitive 17, then 
also vice versa, there is no construction that is reset indifferentiable from 
primitive 7 r. Hence, our theorem complements the results by Demay et al. and 
Luykx et al. |13I18| showing that there can also not be a domain-shrinking 
construction. 

Proving that according to plain indifferentiability, the ICM and ROM are 
equivalent had been a serious challenge and finally involved a Feistel network 
with many rounds. A Feistel network is a domain-doubling construction, and 
is thus ruled out by the previous impossibility results. The few leverages that 
remain to bypass the current impossibility results possibly require quite new 
techniques. Firstly, it might still be possible to build a construction that is 
neither domain shrinking, nor domain extending. However, as we will see later, 
that means settling either direction (RO from IC and vice versa) simultaneously, 
and this might be quite challenging. The second leverage is a distinction that has 
been irrelevant in most works in the area of indifferentiability so far and that we 
would like to point out. Namely, strong indifferentiability requires the simulator 
S to work for any distinguisher D, while weak indifferentiability only demands 
that for every V, there exists a good simulator S. Known constructions are 
usually strongly indifferentiable, while most existing impossibility results rule out 
even weakly indifferentiable constructions. In contrast, we do not rule out weakly 
indifferentiable constructions. It would be interesting to see techniques that make 
non-black-box use of the distinguisher V and establish a reset-indifferentiable 
construction that is domain shrinking. 

Notions between indifferentiability and reset indifferentiability. From the cur- 
rent state-of-the-art, there are two ways to proceed: firstly, we can develop new 
techniques to exploit the few remaining leverages left to bypass the existing 
impossibility results. Secondly, we might weaken the notion of reset indifferen- 
tiability as introduced by Ristenpart et al., to a notion that is achievable by 
constructions and which is sufficient for a subclass of multi-stage games. 

Demay et ai- na introduce resource-restricted indifferentiability where adver- 
saries may share a limited amount of state. If a certain amount s of shared state 
is allowed, then their impossibility result shows that a reset-indifferentiable con- 
struction cannot extend the domain by more than s + [log(.s)] bits. Maybe the 
additional bits allow to bypass the impossibility results more easily, as proving 
domain extension by a few bits might be easier than requiring equality of the 
domain sizes — however, in this setting, the composition results accounts for a 
certain class of games only. 

Another approach that has been put forward by Luykx et al. [H] is to reduce 
the number of resets. Indeed, allowing for a polynomial number of resets/stages 
seems to be an overkill, as some games such as the security model for determin- 
istic encryption m and also certain forms of leakage require a constant number 
of adversarial stages only. To this end, Luykx et al. propose the notion of single- 
reset indifferentiability where a distinguisher can make a single reset call only; 
naturally, a construction that is single reset indifferentiable would be sufficient 
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in any security game consisting of exactly two distinct adversarial stages such as 
deterministic encryption. Analogously, one can define n-reset indifferentiability 
for n + 1 adversarial stages. 

However, as we prove, single-reset indifferentiability is already equivalent to 
full-reset indifferentiability and so are all notions of n-reset indifferentiability. 
Hence, reducing the number of allowed reset queries does not help us to es- 
tablish composition results for a restricted class of games. Thus, if a general 
indifferentiability result is indeed impossible, then it is a curious open question 
how to cope with the uncomfortable situation. It might be possible to establish 
indifferentiability results and composition theorems for a class of games that is 
restricted in another way than by the number of queries. Indeed, it would be in- 
teresting to see how such a class could look like and whether there are games for 
which, in general, finding a suitable, indifferentiable construction is impossible. 

Summary of our Contributions. We first introduce the notion of pseudo- determi- 
nistic algorithms, which captures, that a probabilistic algorithm almost always 
returns the same answer on the same queries and thus shares many properties 
with deterministic algorithms. Essentially, a probabilistic (and possibly state- 
ful) algorithm A is called pseudo deterministic, if no efficient distinguisher with 
black-box access to A can make A return two different answers on the same in- 
put. This notion of pseudo determinism can be seen as an average-case version of 
the pseudo-deterministic algorithms that were recently introduced by Goldreich, 
Goldwasser, and Ron [TB]- While they require probabilism to be hard to detect 
on any input, we only require indistinguishability for efficiently generatable in- 
puts, on the average. As stressed by Goldreich et al. [IB], pseudo-deterministic 
algorithms are practically as useful as deterministic algorithms, but they are also 
easier to construct — which we indeed exploit in our paper. 

We will show in Section [3] that simulators for reset indifferentiability need to 
be stateless and pseudo deterministic. Simplifying pseudo determinism to de- 
terminism for the moment, this allows us to establish what we call the duality 
lemma. Perhaps surprisingly, it states that, with respect to reset indifferentiabil- 
ity, two idealized models are either equivalent or incomparable. The reason is that 
a deterministic and stateless simulator can act as a construction and vice versa. 
Consequently, in order to prove equivalence in terms of reset indifferentiability, 
this lemma makes it sufficient to prove the “easier” direction, whichever this 
might be. In turn, for impossibility results, one might use this as a tool to prove 
impossibility more easily. In fact, we use the duality lemma to establish that not 
only domain-extending constructions are impossible, but also domain- shrinking 
constructions (Section g]) thereby complementing the results of [13] . Note that 
the duality lemma covers strong indifferentiability, leaving non-black-box use of 
the distinguisher as a potential leverage to bypass this impossibility. 

The recently proposed [13] notion of single-reset indifferentiability intends to 
define a notion of indifferentiability that is easier to achieve and simultaneously 
covers an interesting class of multi-stage games that has two adversary stages 
only. Interestingly, as we establish, restricting the number of resets does not 
yield a weaker notion of equivalence. We prove that single- (and n-) reset 
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indifferentiability is equivalent to reset indifferentiability (Section [S|). Maybe sur- 
prisingly, our proof does not rely on a hybrid argument; instead, we establish a 
tight reduction that merely reduces the distinguisher’s advantage by a factor of 2. 

2 Preliminaries 

For a natural number n?Nwe denote by {0, 1}" the set of all bit strings of 
length n. By {0, 1}* we denote the set of all bit strings of finite length. As usual 
| Ad | denotes the cardinality of a set Ad and logarithms are to base 2. For some 
probabilistic algorithm A and input x we denote by A(x: R) the output of A on 
x using randomness R. Throughout this paper we assume that A is a security 
parameter (if not explicitly given then implicitly assumed) and that algorithms 
(resp., Turing machines) run in polynomial time with respect to A. 

In this paper we consider random oracles and ideal ciphers (defined below) 
which we will collectively refer to as ideal primitives. Although we present most 
of the results directly for ideal ciphers and random oracles, the following more 
general notion of ideal primitives allows us to generalize some of our results: 

Definition 1. An ideal primitive 7Z\ is a distribution on functions indexed by 
the security parameter A. For some algorithm A, security parameter A and ideal 
primitive II \ we say that A has access to II if A has oracle access to a function 
f chosen from the distribution II\. 

We simply write 17, i.e., omit the security parameter, if it is clear from the 
context. 

Remark 1. We will usually encounter only single instances of an ideal primitive 
17 at a time. Unless stated otherwise, if multiple parties have access to II, then 
we implicitly assume that the corresponding function / was chosen from the 
distribution II using the same randomness for all parties, i.e., all parties have 
oracle access to the same function /. 

Random Oracles and Ideal Ciphers. A random oracle is the uniform 

distribution on all functions mapping {0, l} e to {0, l} m with £ := £(X) and 
m := m( A). An ideal cipher (£k,n)\ is the uniform distribution on all keyed 
permutations of the form {0, l} fe X {0, 1}" — >• {0, 1}" with k := k( A) and n := 
n( A). That is, for a cipher in the support of (£k,n)x each key k G {0, l} fc describes 
a random (independent) permutation (k, •) : {0, 1}" —> {0, 1}". By abuse of 
notation, the term random oracle (resp., ideal cipher) also refers to a specific 
instance chosen from the respective distribution. 

Keyed vs. unkeyed ciphers. The ideal-cipher model has either been considered as 
a public unkeyed permutation or as a public keyed permutation. We present our 
results in the keyed setting since we feel that the ideal cipher-model is usually 
perceived in this way. However, we want to point out that the results are equally 
valid for the unkeyed setting because our proofs do not rely on the presence of 
a key. 
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Independently of this, one might be tempted to argue that the settings are in- 
terchangeable since we know, for example, constructions of a keyed permutation 
from an ideal public permutation (Even and Mansour, [15]). Note though, that 
in order to make this argument work, one needs to show that these constructions 
are reset indifferentiable. However, the construction by Even and Mansour is a 
domain extender where the key size is twice the message size and we rule out 
reset indifferentiability for such extending constructions in Section 0] We note 
that it is an interesting open problem whether or not such (reset-) indifferentiable 
non-extending transformations exist. 


2.1 Indifferentiability 

Let us now recall the indifferentiability notion of Maurer et al. [TO] in the ver- 
sion by Coron et al. m who replace random systems by oracle Turing machines 
(resp., ideal primitives). Since we are concerned with different types of indifferen- 
tiability, we will sometimes use the term plain indifferentiability when referring 
to this original notion of indifferentiability. 

Definition 2. A Turing machine G with black-box access to an ideal primitive n 
is strongly indifferentiable from an ideal primitive II if there exists a simulator 
S n , such that for any distinguisher D there exists negligible function negl, such 
that: 

|Pr [V G *’*(1 X ) = l] - Pr [: V n ’ sn (l x ) = l] | < negl (A) (1) 

We say that the construction is weakly indifferentiable if for any V there 
exists a simulator S such that © holds. 

We will use the term real world to denote that the distinguisher V talks to the 
construction G* and the primitive n, whereas in the ideal world, the distinguisher 
V talks to the “target” primitive II and simulator S n . The goal of the distin- 
guisher is to determine which of the two pairs of oracles he is talking to. Towards 
this goal, the distinguisher V queries its two oracles, of which one is called the 
honest interface h which is either G n (in the real world) or II (in the ideal world) . 
The other oracle is called the adversarial interface a and corresponds to either 
7 r (real world) or S n (ideal world). Thus, (h,a) := {G n ,Tr) if distinguisher V is 
in the real world and (h, a) := (17, S n ) if it is in the ideal world. The names h 
(honest) and a (adversarial) are in the style of [20] and suggestive: an honest 
party uses a construction as the designer intended; an adversary could, however, 
use the underlying building blocks to gain an advantage. 

Reset Indifferentiability. Ristenpart et al. show [20] that, in general, we cannot 
securely replace a primitive 77 by a construction G n from primitive 7r, if the 
construction is indifferentiable only. Instead, G 7r needs to be (weakly) reset in- 
differentiable from 77 which extends the original indifferentiability definition by 
giving the distinguisher the power to reset the simulator at arbitrary times: 
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Definition 3. Let the setup be as in Definition GJ An oracle Turing machine 
G n is called strongly (resp. weakly,) reset indifferentiable from ideal primitive 
II if the distinguisher V can reset the simulator S to its initial state arbitrarily 
many times during the respective experiment. 

For reset indifferentiability the adversarial interface a in the real world simply ig- 
nores reset queries. Reset indifferentiability now allows composition in arbitrary 
games and not only in single-stage games, as does the original indifferentiability 
notion |20llt)j . 

3 Pseudo-deterministic Stateless Simulators for 
Indifferentiability 

Recall that the composition theorem by Maurer et ai m for plain indifferentia- 
bility holds for single-stage adversaries only. Their theorem says that if (i) the 
construction G 77 is indifferentiable from the ideal primitive 77 and if (ii) there 
is a reduction 1Z that transforms a successful adversary A against some notion 
of security into an adversary 1Z A against a single-stage game in the presence of 
the ideal primitive 77, then also in the presence of the construction G n there is 
a reduction TV that transforms a successful adversary A into an adversary 1Z' A 
against the single-stage game. 

In order to prove a general composition theorem, Ristenpart et al. [50] strengthen 
the notion of indifferentiability to account for the different stages of the adversary. 
They introduce the notion of (weak) reset indifferentiability and prove that the 
aforementioned theorem works for arbitrary games, if the construction G 77 is reset 
indifferentiable from the ideal primitive 77. In contrast to plain indifferentiability, 
here, the distinguisher gets extra powers, namely to reset the simulator at arbitrary 
times. Ristenpart et al- m and Demay et al- m remark that reset indifferentia- 
bility is equivalent to plain indifferentiability with stateless simulators. Intuitively, 
this follows from the observation that the distinguisher in the reset indifferentiabil- 
ity game can simply reset the simulator after each query it asks. We believe that, al- 
beit equivalent, stateless simulators are often easier to handle than reset-resistant 
simulators and thus explicitly introduce indifferentiability with stateless simula- 
tors as multi-stage indifferentiability and then prove that it is equivalent to reset 
indifferentiability. 

In Subsection [321 we prove that strong multi-stage indifferentiability implies 
that the simulators are also pseudo deterministic, a notion that we put forward 
in this section. Relative to a random oracle or an ideal cipher, we show how 
to derandomize pseudo-deterministic simulators, if the simulators are allowed to 
depend on the number of queries made by the distinguisher. 

3.1 Multi-stage Indifferentiability 

A stateless interactive algorithm is an algorithm whose behavior is statistically 
independent from the call/ answer history of the algorithm. We now prove that in- 
differentiability with stateless simulators is equivalent to reset indifferentiability. 
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Definition 4. A construction G with black-box access to primitive n is strongly 
multi stage indifferentiable from primitive II if there exists a stateless probabilis- 
tic polynomial-time simulator S ( with access to II ), such that for any probabilis- 
tic polynomial-time distinguisher V there exists negligible function negl such that: 

|Pr = l] - Pr |’d /i > s "(I x ) = 1 j | < negl (A) (2) 

We say that a construction G n is weakly multi stage indifferentiable from II 
if for any probabilistic polynomial-time distinguisher T> there exists a stateless 
probabilistic polynomial-time simulator S such that ([2]) holds. 

Lemma 1. A construction G with black-box access to primitive i r is weakly 
(resp., strongly) multi stage indifferentiable from primitive II if and only if G is 
weakly (resp., strongly) reset indifferentiable from primitive II. 

Proof. First note that any stateless simulator is, naturally, indifferent to resets 
and thus multi-stage indifferentiability implies reset indifferentiability. Moreover, 
strong reset indifferentiability implies strong multi-stage indifferentiability since 
the simulator for reset indifferentiability must work for any distinguisher, in 
particular for those which reset after each query. Hence this stateful simulator 
can be simply initialized and run by a stateless simulator (the stateless simulator 
does this for each query it receives). 

We now prove the remaining relation, i.e., that weak reset indifferentiability 
implies weak multi-stage indifferentiability. Assume that reset indifferentiability 
holds and consider an arbitrary distinguisher D in the multi-stage indifferentia- 
bility game. From this we construct a distinguisher T>' for the reset indifferentia- 
bility game which runs D and sends a reset query to its adversarial a-interface 
after every a-query issued by V. Let S' be the simulator for V guaranteed to 
exist by reset indifferentiability. We construct a stateless simulator S for multi- 
stage indifferentiability which simply runs (the stateful) S' and resets its own 
state after each query. Now the following equations hold for b £ {0, 1}: 

Pr [D ,JT ’ 5 '(1 A ) = 6] = Pr [T> ,n ’ S ( 1 A ) = b ] = Pr [V n ’ S (l x ) = b] . 

Thus, if equation © holds for (V,S'), then it holds equally for ( D,S ). 


3.2 Pseudo-deterministic Algorithms 

Our notion of pseudo-deterministic algorithms intuitively captures that no dis- 
tinguisher can query the algorithm on an input such that it returns something 
different from the most likely output. That is, the adversary wins if in its set 
of input /output pairs to the algorithm there is a query for which the algorithm 
did not return the most likely response. We also introduce a weak notion of this 
property, where we call A pseudo deterministic for a specific distinguisher if the 
probability of the distinguisher winning in the above experiment is negligible. 

Our notion of pseudo determinism can be seen as an average-case version 
of the pseudo-deterministic algorithms as recently introduced by Goldreich et 
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al. [El. While they require probabilism to be hard to detect on any input, we 
only require indistinguishability for efficiently generatable inputs, on average. 

Definition 5. Let X be a security parameter and A° a stateless probabilis- 
tic polynomial-time oracle Turing machine with access to some oracle O. Let 
L[T>,A, O] denote the induced set of input /output pairs ( x,y ) of AP when queried 
arbitrarily many times by the distinguisher V, where A uses fresh coins in each 
run. We say that A° is pseudo deterministic if for all probabilistic polynomial- 
time distinguishers V there exists a negligible function negl, such that 

Pr -d,a,o [V(®, y) G L[D, A,0 ] y = y X;A o ] > 1- negl(A). (3) 

The notation y XtA o denotes the most likely output of A on input x over the 
randomness of A, i.e., conditioned on a fixed oracle O. If there are two equally 
likely answers on input x, we choose y XtA o to be the lexicographically smaller 
one. 

We say algorithm A° is pseudo deterministic for distinguisher V A P ’'^(1 A ), 
if there exists negligible function negl, such that equation (|3|) holds for V. 

Note that the definition of A being pseudo deterministic for distinguisher V 
does not imply that it is hard to distinguish whether A is probabilistic or 
deterministic — it is only hard for a particular algorithm V. Although this might 
sound like a weak and somewhat useless property, it will be sufficient to show 
that if a simulator is pseudo deterministic for a distinguisher, then the simulator 
can be entirely derandomized via random oracles/ideal ciphers. 

We now show that strong multi-stage indifferentiability implies that the sim- 
ulators are not only stateless but also pseudo deterministic. This is captured by 
the following lemma. 

Lemma 2. Let G n be a construction with black-box access to primitive n which 
is strongly multi stage indifferentiable from primitive II. Then there is a stateless 
pseudo-deterministic probabilistic polynomial-time simulator S such that for all 
probabilistic polynomial-time distinguishers V equation @ holds in the strong 
case. 

Proof. Let us assume there exists stateless simulator S such that for all distin- 
guishers V equation @ holds and such that S is not pseudo deterministic. The 
latter implies that there exists distinguisher V p d against the pseudo determinism 
of simulator <S, i.e., there is a non-negligible probability that D pr j asks a query to 
<S, where <S has a non-negligible probability of returning a different value than the 
most likely one. We now construct distinguisher D' against strong multi-stage 
indifferentiability. Distinguisher T>' runs T> p d on the adversarial a-interface. Let 
qi, ■ ■ . ,qt be the queries asked by T> p d- Distinguisher V then sends the same 
queries once more to its a-interface and returns 1 if at least one response does 
not match and 0 otherwise. If V is in the real world, talking to G 7r and w algo- 
rithm V will always output 0 as 7r is a function. If on the other hand, V is in 
the ideal world, then V p< i will succeed with noticeable probability and hence V 
will distinguish both worlds with noticeable probability, a contradiction. 
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Deterministic Simulators. Bennett and Gill prove in [7] that relative to a random 
oracle the complexity classes BW and V are equivalent. Let us quickly sketch 
their idea. Given a probabilistic polynomial time oracle Turing machine M. n 
which has access to random oracle 'll and which decides a language C in BW 
we can prove the existence of a deterministic polynomial time Turing machine 
V n which also decides £. Let us by p(\x\) denote the runtime of machine M n for 
inputs of length \x\. As runs in polynomial time there exists a polynomial 
upper bound p(|a;|) on the length of queries M n can pose to the random oracle. 
To derandomize M. n we construct a deterministic machine V n which works 
analogously to M. n with the single exception that when M. n requests a random 
coin then V n generates this coin deterministically by querying the random oracle 
on the next smallest input that cannot have been queried by A4 n due to its 
runtime restriction. As the random oracle produces perfect randomness, the 
machines decide the same language with probability 1 over the choice of random 
oracle. 

Using the techniques developed by Bennet and Gill [7] we now show that in the 
multi-stage indifferentiability setting, if a simulator is pseudo deterministic for a 
distinguisher V, then it can be derandomized, in case the constructed primitive 
II is a random oracle or an ideal cipher. When applied to a simulator S that is 
universal for all distinguishers (strong indifferentiability), these derandomization 
techniques yield a family of simulators that depends only on the number of 
queries made by the distinguisher (weak indifferentiability) . 

Lemma 3. Let A n be a stateless probabilistic polynomial-time algorithm with 
oracle access to a random oracle 72^ m or an ideal cipher £k, n for £ £ u) (log A) 
(resp., (k + n) e w(logA),). Let s be polynomial in A. From A n , we construct 
a deterministic algorithm B n such that the following holds: for all efficient dis- 
tinguisher V that make less than s queries to their oracle, it holds that if A n is 
pseudo deterministic for V, then 

|Prii,77 = ! j _ p ri7 = | 

is negligible, where the probability is over the choice of oracle II and algorithm 
A’s and distinguisher D’s internal coin tosses for the first case and over the 
choice of oracle II and distinguisher V ’s internal coin tosses in the second. 

Proof. Let A n be a stateless algorithm with access to ideal primitive II where 
II is either a random oracle 7£^ m or an ideal cipher £k, n - 

Let V be an efficient distinguisher for which A n is pseudo deterministic. As 
distinguisher T> is efficient, there exists an upper bound p(| A|) on the number of 
queries to the 77-interface by D. We construct a deterministic algorithm B which 
works as A with the only exception that B deterministically generates “random” 
bits by querying its random oracle, whenever A makes use of a random bit. For 
the jth requested random bit, algorithm B calls the 77-oracle (either random 
oracle 72. or ideal cipher £ where it uses the encryption interface of £) on p(| AQ+j 
distinct values xor-ing the result and choosing a bit from this result. Note that 
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as I £ uj (log A) (resp., n + k £ u (log A)) there exist sufficiently many distinct 
values. 

Remember that we denote by y q ^o the most likely output of algorithm A on 
input q conditioned on fixed oracle O. We want to prove that 

\Prn,-D,A [v n ’ An (l x ) = l] - Pr n(D [v 11 ’ 8 " (1 A ) = l] | 

is negligible in A. We prove a stronger statement, namely, that the outputs 
of A and B are likely to be identical. We define event C capturing that “the 
outputs of A and B agree on all inputs.” Towards this goal we define event A as 
“algorithm A returns y qi ,A n f° r all queries qi" where y qi .A n is the most likely 
answer of A n on input q t , i.e., we set y qi .A n : = arg max y {Pr fl [A n (qi; R) = y ] } 
(cf. Definition [5j) . Likewise, we define event B as “algorithm B returns y qi .A n for 
all queries qi” We will show that 


Prjr,i>,^[A] > 1 - negl 

(4) 

Pr 7 T,x>[B] > 1 — negl. 

(5) 


Clearly, the probability that A and B produce the same answers for all q t is 
lower bounded by the probability that A and B both output y, H .A n tor all q t . 
Thus, 


P*n,D,A [ C ] > Prn,T>,A [ A A B ] 

= 1 — Pr/7,D,^[^A V -iB] 

> 1 — (Ptii,d,a [ -, A ] + Prn,D [ ~ 'B ]) 

> 1 — negl - negl. 

Let us now make these statements formal as well as prove inequalities dU and 
© . We denote with q t the queries to A by V and by the randomness used 
by A on query q t . We say that event A occurs (over II, D, Ri , .... R n ). if 

Vi A n (qi-,Ri) = y qi ,A n - 

Note that the pseudo-determinism of A for D directly implies that 

[Vi A n {qi\ Ri) = y qi ,A n | > 1 - negl, (6) 

which establishes inequality (|4]). We say that event B occurs (over II, V), if 
Vi B n {qi) = y quA n, 

where qi now denotes the queries by V to algorithm B. Inequality (J5| we derive 
from inequality via an averaging argument. Note that in inequality © we 
consider fresh randomness Ri for every query q t . If for all queries qi a random 
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choice of randomness is good with overwhelming probability, then a random 
choice of randomness is good for all q, with overwhelming probability: 

Prn, 2 ?, ii [Vi A n {qc, R) = y qi ,A n ] > 1 ~ n egl- (7) 

Moreover, when considering the random oracle via lazy sampling, one can observe 
that the randomness generated by B from 77 is independent from the part of 77 
that is used in the experiment, which yields that 

Prjr,x> [Vi; B n (qi) = y qi ,A n ] = Pr n,T>,R [Vi; A n (qi ; R) = y qi ,A n } 

> 1 — negl 


as desired. 

4 The Random Oracle and Ideal Cipher Model Are 
Incomparable 

In this section we prove that the random oracle- model and the ideal cipher-model 
are incomparable with respect to strong multi-stage indifferentiability. We start 
by giving an alternative, simpler proof of the fact that multi-stage indifferentiable 
constructions cannot be built via domain extension |ld!18| (Lemma 0]). jT3] rule 
out domain extension even for a single bit of extension. In turn, we obtain an 
easier proof in the setting where the extension factor is super logarithmic. In 
Section [4. II we then present our duality lemma for multi-stage indifferentiability 
which allows us to conclude that the ROM and the ICM are incomparable with 
respect to strong multi-stage indifferentiability. 

Lemma 4. Let R be a random oracle with domain {0, l} e (resp., £ be an ideal 
cipher with domain {0, l} fc x {0, l} n ) and 7 r fee any ideal primitive with domain 
size 2 V . For i — v G w(log(A)) (resp., k + n — v G w(log(A)),) there exists no 
construction G n that is weakly multi-stage indifferentiable from 1Z (resp., £). 

We prove Lemma [4] for the random oracle case; the proof for ideal ciphers works 
analogously. Note that we prove the statement for weak multi-stage indifferentia- 
bility, thereby essentially ruling out any (possibly non-black-box) construction. 

In the following proof we consider a particular distinguisher that tests for the 
ideal world by forcing the simulator to query its oracle on a particular value 
M. We show that no simulator is able to do this with more than negligible 
probability since M is drawn from a very large set while the simulator, being 
stateless, is only able to make queries from a negligible fraction of this large set; 
it thus fails to pass the test. 

Proof (Proof of Lemma^f. Assume towards contradiction that there exists con- 
struction G 7r that is weakly multi stage indifferentiable from random oracle 7Z 
and, hence, for every distinguisher V there exists a stateless simulator S such 
that V cannot distinguish between the real and ideal world. 
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We consider a distinguisher V h a with access to honest and adversarial inter- 
faces (h, a) which implement the random oracle 1Z and simulator <S in the ideal 
world and construction G 7r and ideal primitive ir in the real world. The distin- 
guisher V chooses a message M e {0, 1 Y uniformly at random and executes 
construction G via an internal simulation using its adversarial interface a, i.e. , it 
computes G a (M). Then, the distinguisher asks its honest interface on message 
M to compute h(M) and returns 1 if the two results agree and 0 otherwise. Note 
that in the real world distinguisher V will always output 1. Thus, the simulator 
S has to ensure that G s (M) is equal to 1Z(M) with overwhelming probabil- 
ity over the choice of the random oracle 1Z. We now prove that, in the ideal 
world, the two values match only with negligible probability over the choice of 
the message M and the two settings can thus be distinguished by V. 

Let us assume the ideal world and denote the query /response pairs to the a- 
interface with {qi,ri) i<i<t- We analyze the simulator’s behavior when it is asked 
these queries qi,...,q t . If for none of the q,, the simulator S asks the random oracle 
on M, then the answer of G s ( M ) is independent of 1Z(M) and thus different 
with overwhelming probability. By a simple counting argument, we now prove 
that, with high probability over the choice of M, on no query (not even one 
outside of the set (qi,r$igi<t), the simulator S asks 1Z on M. For this, note 
that the queries which simulator S receives are of length v. Hence there are at 
most 2 V distinct possible queries to S. Denote by c the upper bound on the 
number of queries that S asks to its random oracle over all possible queries 
that S itself receives. As the simulator S runs in polynomial time c exists and is 
polynomial. Noting that S is stateless, we conclude that S asks at most c2 v -C 2 i: 
queries. Hence the probability that the distinguisher’s M is in the set 

{M : 3 qS n asks M on input q\ 

is negligible. The probability that the distinguisher V returns 1 in the ideal world 
where it is given access to simulator S and a random oracle 1 Z is therefore also 
negligible. Thus, the distinguisher V has a distinguishing advantage of almost 1 
which concludes the proof. 


4.1 The Duality Lemma for Multi-stage Indifferentiability 

We now prove the inverse direction, that is an ideal cipher cannot be build 
from a random oracle with larger domain. In contrast to the previous section 
we here give an impossibility result for strong multi-stage indifferentiability. Our 
result is, however, more general and of independent interest. Strong multi-stage 
indifferentiability guarantees the existence of a simulator that is stateless and 
deterministic. Constructions of ideal primitives often need to be stateless and 
deterministic as well. If for example, the construction, implements a publicly 
accessible function such as a hash function, it has to be stateless. Note that this 
is the case both for random oracles and ideal ciphers. 

Now, if we assume that constructions are deterministic and stateless, then 
we show that, in the case of multi-stage indifferentiability, we can exchange the 
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role of the construction and the role of the simulator, if the simulator is also 
deterministic and stateless. Our Duality Lemma establishes that in this case, an 
impossibility result (resp. feasibility result) in one direction translates into an 
impossibility result (resp. feasibility result) in the other direction. However, if 
the simulator is not deterministic, but only pseudo deterministic, then we need 
to slightly adapt our notion of constructions to also allow pseudo-deterministic 
constructions. For this note that pseudo deterministic constructions are as useful 
as deterministic ones since inconsistencies due to the pseudo determinism can 
only be detected with negligible probability. Formally, however, they are not 
known to be equivalent, in particular, because V / BVV implies that pseudo- 
deterministic polynomial-time algorithms are more powerful than deterministic 
polynomial-time algorithms. 

We prove the Duality Lemma in the case of strong multi-stage indifferentia- 
bility. 

Lemma 5 (Duality Lemma for Multi-stage Indifferentiability). Let n 

and 7r' by two ideal primitives. Assuming constructions are stateless and pseudo 
deterministic, then one of two following statements holds: 

1. The two primitives are computationally equivalent, i.e., there exist construc- 
tions G i , C ?2 such that G \ is strongly multi stage indifferentiable from ir' and 
Gf is strongly multi stage indifferentiable from n, or 

2. 7r and i r' are incomparable with respect to strong multi-stage indifferentiabil- 
ity. 

In essence this means that a positive or negative result in either direction gives 
us a result for the other direction. As we have already seen a negative result for 
domain extenders this gives us the result for the other directions, i.e., going from 
a large random oracle 1Z to a small ideal cipher £, or from a large ideal cipher 
£ to a small random oracle 1Z. 

Proof (Proof of Lemma [7p. Assume construction G n with black-box access to 
ideal primitive n is strongly multi stage indifferentiable from tt'. Then by defini- 
tion there exists a (pseudo-)deterministic, stateless simulator S such that no dis- 
tinguisher V can tell apart the ideal world {n' ,S n ) from the real world (G 77 . tt). 
Likewise, by definition, G is stateless and (pseudo-)deterministic. We now ex- 
change the roles of construction G and simulator S , thereby getting a new “con- 
struction” S 77 implementing primitive tt. It remains to show that S 77 is strongly 
multi-stage indifferentiable from n. 

Let us assume the contrary. Then there exists distinguisher V that can distin- 
guish between the settings (n',S n ) and the setting (G 77 . tt). This, however, con- 
tradicts the assumption that G 77 is strongly multi stage indifferentiable from tt' . 

An immediate consequence of the duality lemma and Lemma 0] is captured by 
the following corollary: 

Corollary 1. The ideal cipher model and the random oracle model are incom- 
parable with respect to strong multi-stage indifferentiability. 
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Remark 2. One interesting consequence of the duality lemma is best seen by 
an example: Can a random oracle with smaller domain be constructed from 
a random oracle with a larger domain? Intuitively, it feels natural to assume 
that this works. However, Lemma H] tells us, that the inverse is not possible 
and, thus, by the duality lemma we can directly conclude that any construction 
using a large random oracle cannot be strongly multi stage indifferentiable from 
a small random oracle. So far, we have failed to either prove impossibility for 
weak multi-stage indifferentiability or to come up with a construction. We leave 
this for future work. 


5 Single versus Multi-reset 

Luykx et al. m introduce the presumably weaker notion of n-reset indifferen- 
tiability, where the distinguisher is allowed to reset the simulator only n times. 
Naturally, for a construction that is n-reset indifferentiable the composition the- 
orem holds for games that have n + 1 or less stages. In the following we show 
that, however, already the extreme single-reset notion implies full reset indiffer- 
entiability for simulators that do not depend on the distinguisher (i.e., the strong 
case). This yields that also for n-reset indifferentiability all our separations hold 
in a black-box fashion. 

What we prove is that the advantage of an n-reset distinguisher is bound by 
the advantage of an (n — l)-reset distinguisher and that of a single-reset distin- 
guisher where the advantage of a distinguisher T> in the n-reset indifferentiability 
game is defined as 


^T>' r -’ s ' r ( 1 x ) = lj - Pr [v G *’ n (l x ) = 1 j 


Adv£5r et := Pr [D 1 


Assuming that a construction is strongly single reset indifferentiable (and thus 
the advantage for any single-reset distinguisher is negligible) yields the above 
claim. We use 

Lemma 6. Let G v be a construction with black-box access to primitive n. Then 
there exists simulator S such that for all n > 1 and all distinguishers V n that 
make at most n reset queries there exists a distinguisher L>„_i that makes at 
most n — 1 reset queries and a distinguisher D\ that makes a single reset query 
and 



Adv^p® 6 ^!*) < Advi 


is negligible in A. 

The proof idea is simple. Given a distinguisher which makes n resets we con- 
struct one that ignores the first reset. Now, either this changes the input/output 
behavior of the simulator noticeably, which yields a distinguisher that only needs 
a single reset, or it does not in which case the distinguisher with n — 1 resets is 
as good as the n-reset distinguisher. 
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Proof. Let V n be a distinguisher that makes at most n reset queries. We con- 
struct a distinguisher 2> n _ i as follows. The distinguisher T> n -\ runs exactly as 
D n but does not perform the first reset query of V n . 

In the real world, where the distinguisher is connected to the construction G n 
and 7 r, reset queries have no effect and thus we immediately have that 

Pr^ [Vf’”(l x -,rv) = l] = Pr rc (1 A ; r v ) = l] (8) 

where the probability is over the random coins rx> of the distinguisher. 

Let in the ideal world L 2 [D n ,S,'lZ,rx>,rs] denote the ordered list of query- 
answer pairs of queries by distinguisher V n to simulator S up to the second 
reset query by V n when V n runs with randomness r-p and simulator S runs with 
randomness rs and 1Z is the random oracle. Note that after each reset query 
simulator S takes a fresh set of random coins. Thus, technically we have that 
rs := Lsll r sll • ■ ■ where 7*5 denotes the simulator’s coins up to the first reset and 
rg its coins after the first and up to the second reset. All further random coins 
are irrelevant for the definition of L 2 since we only consider queries up to the 
second reset query. 

Similarly, we define L\ \D n -\ , S, 1Z, rx>, rs] to be the list of query-answer pairs 
by distinguisher T > n - 1 to simulator S up to the first reset query. Note that again 
r s : = r sll r sll • • • hut this time already the second part (r|) is irrelevant since 
we only consider queries up to the first reset query. 

Define predicate E(1Z, rx>,rs) to hold, iff 

L 2 ['D n ,S,'R,, rx>,rs\ = Ia[2> n _i,«S,72.,r©,rs] 

for a random oracle 1Z and randomnesses r© and rs . Note that in case of event 
E(1Z, rv , rs) it holds that 


Prrc,rx5,r s [T%' s *{l x ) = 1 | E (K,r v ,r s ) ] 

= Pr n>rvtrs (1 A ) = 1 | E (K, r v ,r s ) ] . (9) 

In the following we simplify notation and do not make the probability space 
explicit. That is, the probabilities in the ideal world are always over the random 
oracle 1Z the random coins of the distinguisher rx> and the various random coins 
of the simulator rs- Also, we simply write E instead of E(1Z, rx> , rs) ■ 

Let T>i denote a distinguisher which makes only a single reset query and which 
works as follows: T>\ runs V. n up to the second reset query, passing on queries 
to its own oracles but not passing on the two reset queries. Let 7J\ denote the 
queries to the simulator up to the first (ignored) reset query and <72 the queries 
to the simulator after the first (ignored) reset and up to the second (ignored) 
reset. Now, after the second ignored reset, distinguisher V 1 makes its single reset 
query and once more sends the sequence TpJ to the simulator. It outputs 0 in case 
the simulator’s answers are consistent with the previous 92 sequence and else it 
outputs 1. See Figured] for a pictorial representation of this operation. 
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Fig. 1 . Illustration of D n and Z?„_i’s operation; circles denote queries and rectangles 
denote resets. The dashed part resembles the resulting single-reset distinguisher T>i 
that asks the queries 52 twice (separated by a reset). Whether or not the answer to 
these two query sequences are identical is captured by the event E. 


In the real world, distinguisher D 1 will always output 0 since the answers will 
always match. Thus, we observe that 

Adv“(l A ) =Pr[2?f’ 57 V) = l] -Pr[pf ’ T (1 A ) = l] 

= Pr[£>f-’ sTC (l A ) = 

>Pr[E] •Pr[Df’ s7i (l A ) = 1 | E ] 

= Pr[E], (10) 

For the last equality, note that if E occurs then there is at least one query answer 
that differs in both runs. This difference must be during q? since, up to D n ’s 
first reset, both algorithms are identical and operate on the same coins with the 
same oracles. Hence D 1 always detects this difference and outputs 1. Thus, we 
have 

AdvSjSf (1 A ) =Pr[^’ SW (l A ) = l] -Pr[©f ’-(1 A ) = l] 

= Pr[E] •Pr[x>^’ sTC (l A ) = 1 | E] 

+ Pr [E] • Pr [T>^’ 5TC (1 A ) = 1 | E ] - Pr 1 A ) = l] 

< Pr [d^’ sTC ( 1 a ) = 1 | E j + Pr [ E] — Pr 1 A ) = l] . 

Using equations © and © we can exchange distinguisher V n for distinguisher 
D n _i and after reordering we get that 

= Pr[p£lfV A ) = 1 | E] - Pr (X A ) = l] +Pr[E] . 
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Using equation m 

< Pr ( 1 A ) = 1 | E ] - Pr [pf f (1 A ) = l] + Adv“(l A ) 

<Advg-^ eset (l A ) + Adv^ et (l A | 
which yields the desired statement. 
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Abstract. Fuzzy extractors derive strong keys from noisy sources. Their 
security is defined information-theoretically, which limits the length of 
the derived key, sometimes making it too short to be useful. We ask 
whether it is possible to obtain longer keys by considering computational 
security, and show the following. 

— Negative Result: Noise tolerance in fuzzy extractors is usually 
achieved using an information reconciliation component called a “se- 
cure sketch.” The security of this component, which directly affects 
the length of the resulting key, is subject to lower bounds from 
coding theory. We show that, even when defined computationally, 
secure sketches are still subject to lower bounds from coding the- 
ory. Specifically, we consider two computational relaxations of the 
information-theoretic security requirement of secure sketches, using 
conditional HILL entropy and unpredictability entropy. For both 
cases we show that computational secure sketches cannot outper- 
form the best information-theoretic secure sketches in the case of 
high-entropy Hamming metric sources. 

— Positive Result: We show that the negative result can be overcome 
by analyzing computational fuzzy extractors directly. Namely, we 
show how to build a computational fuzzy extractor whose output 
key length equals the entropy of the source (this is impossible in 
the information-theoretic setting) . Our construction is based on the 
hardness of the Learning with Errors (LWE) problem, and is secure 
when the noisy source is uniform or symbol-fixing (that is, each 
dimension is either uniform or fixed) . As part of the security proof, 
we show a result of independent interest, namely that the decision 
version of LWE is secure even when a small number of dimensions 
has no error. 

Keywords: Fuzzy extractors, secure sketches, key derivation, Learning 
with Errors, error-correcting codes, computational entropy, randomness 
extractors. 


1 Introduction 

Authentication generally requires a secret drawn from some high-entropy source. 
One of the primary building blocks for authentication is reliable key derivation. 
Unfortunately, many sources that contain sufficient entropy to derive a key are 
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noisy, and provide similar, but not identical secret values at each reading (ex- 
amples of such sources include biometrics [H], human memory m, pictorial 
passwords [9] , measurements of capacitance [35] , timing [34] , motion jT0| , quan- 
tum information [5] etc.). 

Fuzzy extractors PH achieve reliable key derivation from noisy sources (see 
[7116111] for applications of fuzzy extractors). The setting consists of two algo- 
rithms: Generate (used once) and Reproduce (used subsequently). The Generate 
(Gen) algorithm takes an input w and produces a key r and a public value p. 
This information allows the Reproduce (Rep) algorithm to reproduce r given p 
and some value w' that is close to w (according to some predefined metric, such 
as Hamming distance). Crucially for security, knowledge of p should not reveal 
r; that is, r should be uniformly distributed conditioned on p. This feature is 
needed because p is not secret: for example, in a single-user setting (where the 
user wants to reproduce the key r from a subsequent reading w'), it would be 
stored in the clear; and in a key agreement application [7] (where two parties 
have w and w' . respectively), it would be transmitted between the parties. 

Fuzzy extractors use ideas from information-reconciliation [5] and are defined 
(traditionally) as information-theoretic objects. The entropy loss of a fuzzy ex- 
tractor is the difference between the entropy of w and the length of the derived 
key r. In the information-theoretic setting, some entropy loss is necessary as the 
value p contains enough information to reproduce r from any close value w' . A 
goal of fuzzy extractor constructions is to minimize the entropy loss, increasing 
the security of the resulting application. Indeed, if the entropy loss is too high, 
the resulting secret key may be too short to be useful. 

We ask whether it is possible to obtain longer keys by considering computa- 
tional, rather than information theoretic, security. 

Our Negative Results. We first study (in Section [3J) whether it could be fruitful 
to relax the definition of the main building block of a fuzzy extractor, called a 
secure sketch. A secure sketch is a one-round information reconciliation protocol: 
it produces a public value s that allows recovery of w from any close value w' . 
The traditional secrecy requirement of a secure sketch is that w has high min- 
entropy conditioned on s. This allows the fuzzy extractor of m to form the key 
r by applying a randomness extractor m to w, because randomness extractors 
produce random strings from strings with conditional min-entropy. We call this 
the sketch- and- extract construction. 

The most natural relaxation of the min-entropy requirement of the secure 
sketch is to require HILL entropy [21] (namely, that the distribution of w con- 
ditioned on s be indistinguishable from a high- min-entropy distribution). Under 
this definition, we could still use a randomness extractor to obtain r from w , be- 
cause it would yield a pseudorandom key. Unfortunately, it is unlikely that such 
a relaxation will yield fruitful results: we prove in Theorem [T| that the entropy 
loss of such secure sketches is subject to the same coding bounds as the ones 
that constrain information-theoretic secure sketches. 

Another possible relaxation is to require that the value w is unpredictable con- 
ditioned on s. This definition would also allow the use of a randomness extractor 
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to get a pseudorandom key, although it would have to be a special extractor — 
one that has a reconstruction procedure (see ED Lemma 6]). Unfortunately, this 
relaxation is also unlikely to be fruitful: we prove in Theorem [D that the unpre- 
dictability is at most log the size of the metric space minus log the volume of 
the ball of radius t. For high-entropy sources of w over the Hamming metric, 
this bound matches the best information-theoretic security sketches. 

Our Positive Results. Both of the above negative results arise because a secure 
sketch functions like a decoder of an error-correcting code. To avoid them, we 
give up on building computational secure sketches and focus directly on the 
entropy loss in fuzzy extractors. Our goal is to decrease the entropy loss in a 
fuzzy extractor by allowing the key r to be pseudorandom conditioned on p. 

By considering this computational secrecy requirement, we construct the first 
lossless computational fuzzy extractors (Construction [T]) , where the derived key 
r is as long as the entropy of the source w. Our construction is for the Hamming 
metric and uses the code-offset construction ED>ED Section 5] used in prior 
work, but with two crucial differences. First, the key r is not extracted from w 
like in the sketch-and-extract approach; rather w “encrypts” r in a way that is 
decryptable with the knowledge of some close w' (this idea is similar to the way 
the code-offset construction is presented in ED as a ‘ ‘fuzzy commitment”). Our 
construction uses private randomness, which is allowed in the fuzzy extractor 
setting but not in noiseless randomness extraction. Second, the code used is a 
random linear code, which allows us to use the Learning with Errors (LWE) 
assumption due to Regev [BOIBlj and derive a longer key r. 

Specifically, we use the recent result of Dottling and Miiller-Quade ED, which 
shows the hardness of decoding random linear codes when the error vector comes 
from the uniform distribution, with each coordinate ranging over a small interval. 
This allows us to use w as the error vector, assuming it is uniform. We also use 
a result of Akavia, Goldwasser, and Vaikuntanathan E]> which says that LWE 
has many hardcore bits, to hide r. 

Because we use a random linear code, our decoding is limited to reconciling 
a logarithmic number of differences. Unfortunately, we cannot utilize the results 
that improve the decoding radius through the use of trapdoors (such as [5U]h 
because in a fuzzy extractor, there is no secret storage place for the trapdoor. 
If improved decoding algorithms are obtained for random linear codes, they 
will improve error-tolerance of our construction. Given the hardness of decoding 
random linear codes [D ■ we do not expect significant improvement in the error- 
tolerance of our construction. 

In Section [5J we are able to relax the assumption that w comes from the uni- 
form distribution, and instead allow w to come from a symbol-fixing source m 
(each dimension is either uniform or fixed) . This relaxation follows from our re- 
sults about the hardness of LWE when samples have a fixed (and adversarially 
known) error vector, which may be of independent interest (Theorem 0]) . 

An Alternative Approach. Computational extractors [2613113] have the same goal 
of obtaining a pseudorandom key r from a source w in the setting without errors. 
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They can be constructed, for example, by applying a pseudorandom generator to 
the output of an information-theoretic extractor. One way to build a computa- 
tional fuzzy extractor is by using a computational extractor instead of the 
information-theoretic extractor in the sketch-and-extract construction of UM- 
However, this approach is possible only if conditional min-entropy of w conditioned 
on the sketch s is high enough. Furthermore, this approach does not allow the use of 
private randomness; private randomness is a crucial ingredient in our construction. 
We compare the two approaches in Section l4~4l 

2 Preliminaries 

For a random variable X = X- t ||...||X„ where each Xj is over some alphabet 
Z, we denote by X lt ...j, = X 1 ||...||W fe . The min-entropy of X is H^X) = 
— log(max x Pr[X = re]), and the average (conditional) min-entropy of X given 
Y is Hoc {X\Y) = -log(E yey max x Pr[X = x\Y = y]) [13 Section 2.4], The 
statistical distance between random variables X and Y with the same domain 
is A{X, Y) = | I Pr[df = x] — Pr[y = x]|. For a distinguisher D (or a class 
of distinguishers D ) we write the computational distance between X and Y as 
5 d (X,Y) = \E[D(X)} — E[D(y)]|. We denote by T> Ssec the class of randomized 
circuits which output a single bit and have size at most s sec . For a metric space 
(A4,dis), the (closed) ball of radius t around x is the set of all points within 
radius t, that is, B t ( x) = (y|dis(x, y) < t}. If the size of a ball in a metric 
space does not depend on x, we denote by |B t (-)| the size of a ball of radius t. 
For the Hamming metric over Z n , |-B t (-)| = Yd= o (")( \%\ ~ !)*• U n denotes the 
uniformly distributed random variable on {0, 1}". Usually, we use bold letters 
for vectors or matrices, capitalized letters for random variables, and lowercase 
letters for elements in a vector or samples from a random variable. 

2.1 Fuzzy Extractors and Secure Sketches 

We now recall definitions and lemmas from the work of Dodis et. al. PS Sections 
2. 5-4.1], adapted to allow for a small probability of error, as discussed in [13 
Sections 8]. Let M be a metric space with distance function dis. 

Definition 1. An (A4 . rri, i,t,e) -fuzzy extractor with error 6 is a pair of ran- 
domized procedures, “generate” (Gen) and “reproduce” (Rep), with the following 
properties: 

1. The generate procedure Gen on input w £ M. outputs an extracted string 
r £ {0, l} 1 2 and a helper string p £ {0, 1}*. 

2. The reproduction procedure Rep takes an element w' £ M and a bit string 
p £ {0, 1}* as inputs. The correctness property of fuzzy extractors guarantees 
that for w and w' such that dis(w, w') < t, ifR, P were generated by (R, P) «— 
Gen(tc), then Rep(u/, P) = R with probability (over the coins of Gen, Rep ) at 
least 1 — 6. //dis(iy, w') > t, then no guarantee is provided about the output 
of Rep. 
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3. The security property guarantees that for any distribution W on Ad of min- 
entropy m, the string R is nearly uniform even for those who observe P: if 
(P, P ) 4 - Gen(fT), then SD((P, P), (U e , P)) < e. 

A fuzzy extractor is efficient if Gen and Rep run in expected polynomial time. 

Secure sketches are the main technical tool in the construction of fuzzy ex- 
tractors. Secure sketches produce a string s that does not decrease the entropy 
of w too much, while allowing recovery of w from a close w': 

Definition 2. An (Ad,m,m,t) -secure sketch with error 6 is a pair of random- 
ized procedures, “sketch” (SS) and “recover” (Rec), with the following properties: 

1. The sketching procedure SS on input w £ A4 returns a bit string s £ {0, 1}*. 

2. The recovery procedure Rec takes an element w' £ AA. and a bit string 
s £ {0,1}*. The correctness property of secure sketches guarantees that if 
dis (w,w') < t, then Pr[Rec(u/, SS(iu)) = to] > 1 — 5 where the probability is 
taken over the coins of SS and Rec. If d\s(w,w') > t, then no guarantee is 
provided about the output of Rec. 

3. The security property guarantees that for any distribution W over Ad with 
min-entropy m, the value of W can be recovered by the adversary who ob- 
serves w with probability no greater than 2~ m . That is, H 00 (W|SS(kF)) > to. 

A secure sketch is efficient if SS and Rec run in expected polynomial time. 

Note that in the above definition of secure sketches (resp., fuzzy extractors), 
the errors are chosen before s (resp., P) is known: if the error pattern between 
w and to' depends on the output of SS (resp., Gen), then there is no guarantee 
about the probability of correctness. 

A fuzzy extractor can be produced from a secure sketch and an average-case 
randomness extractor. An average-case extractor is a generalization of a strong 
randomness extractor [25] Definition 2]) (in particular, Vadhan [35] Problem 6.8] 
showed that all strong extractors are average-case extractors with a slight loss 
of parameters): 

Definition 3. Let xi, X 2 be finite sets. A function ext : xi x {0, l} d -A {0, 1 Y a 
(to, e)-average-case extractor if for all pairs of random variables X, Y over xi , X 2 
such that Hoo(X\Y) > m, we have A((ext(X, U d ), U d , Y), U t xU d xY)< e. 

Lemma 1. Assume (SS, Rec) is an ( AA,m,rh,t)-secure sketch with error 5, and 
let ext : Ad x {0, l} d —i {0,1}^ be a ( rh,e)-average-case extractor. Then the 
following (Gen, Rep) is an ( Ad,m,£,t,e)-fuzzy extractor with error S: 

— Gen(w) : generate x <— {0, l} d , setp = (SS (w),x),r = ext(w,x), and output 
{r,p). 

— Rep(u/, (s, x)) : recover w = Rec(u/, s ) and output r = ext(iu; x ). 

The main parameter we will be concerned with is the entropy loss of the con- 
struction. In this paper, we ask whether a smaller entropy loss can be achieved 
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by considering a fuzzy extractor with a computational security requirement. We 
therefore relax the security requirement of Definition [T] to require a pseudoran- 
dom output instead of a truly random output. Also, for notational convenience, 
we modify the definition so that we can specify a general class of sources for 
which the fuzzy extractor is designed to work, rather than limiting ourselves 
to the class of sources that consists of all sources of a given min-entropy to, as 
in definitions above (of course, this modification can also be applied to prior 
definitions of information-theoretic secure sketches and fuzzy extractors). 

Definition 4 (Computational Fuzzy Extractor). Let W be a family of 
probability distributions over M.. A pair of randomized procedures “generate” 
(Gen) and “reproduce” (Rep) is a (A4, W, £, ^-computational fuzzy extractor 
that is (e, s sec )-hard with error 6 if Gen and Rep satisfy the following properties: 

— The generate procedure Gen on input w £ M. outputs an extracted string 
R e (0, 1}^ and a helper string P e (0, 1}*. 

— The reproduction procedure Rep takes an element w' G M and a bit string 
P € {0, 1}* as inputs. The correctness property guarantees that for all w,w' 
where dis(u;,u/) < t, if ( R,P ) <— Gen(w) then Pr[Rep(u/, P) = f?] > 1 — 6 
where the probability is over the randomness of (Gen, Rep). If d\s(w,w') > t, 
then no guarantee is provided about the output of Rep. 

— The security property guarantees that for any distribution W £ W, the string 
R is pseudorandom conditioned on P, that is $ Dai “°{{R,P), ( U(,P )) < e. 

Any efficient fuzzy extractor is also a computational fuzzy extractor with the 
same parameters. 

Remark. Fuzzy extractor definitions make no guarantee about Repbehavior when 
the distance between w and w' is larger than t. In the information-theoretic setting 
this seemed inherent as the “correct” R should be information-theoretically un- 
known conditioned on P. However, in the computationally setting this is not true. 
Looking ahead, in our construction R is information-theoretically determined con- 
ditioned on P (with high probability over the coins of Gen) . Our Rep algorithm will 
never output an incorrect key (with high probability over the coins of Gen) but may 
not terminate. However, it is not clear this is the desired behavior. For this reason, 
we leave the behavior of Rep ambiguous when dis(tu, w') > t. 

3 Impossibility of Computational Secure Sketches 

In this section, we consider whether it is possible in build a secure sketch that 
retains significantly more computational than information-theoretic entropy. We 
consider two different notions for computational entropy, and for both of them 
show that corresponding secure sketches are subject to the same upper bounds 
as those for information-theoretic secure sketches. Thus, it seems that relaxing 
security of sketches from information-theoretic to computational does not help. 

In particular, for the case of the Hamming metric and inputs that have full 
entropy, our results are as follows. In Section 13.11 we show that a sketch that 
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retains HILL entropy implies a sketch that retains nearly the same amount of 
min-entropy. In Section HO we show that the computational unpredictability 
of a sketch is at most log|.Ad| — log |B t (-)|. Dodis et al. [lH Section 8. 21 con- 
struct sketches with essentially the same information-theoretic securitjjj . In 
Section 1331 we discuss mechanisms for avoiding these bounds. 

3.1 Bounds on Secure Sketches Using HILL Entropy 

HILL entropy is a commonly used computational notion of entropy EH- It was 
extended to the conditional case by Hsiao, Lu, Reyzin [22] . Here we recall a 
weaker definition due to Gentry and Wichs PI (the term relaxed HILL en- 
tropy was introduced in [32]); since we show impossibility even for this weaker 
definition, impossibility for the stronger definition follows immediately. 

Definition 5. Let (W, S) be a pair of random variables. W has relaxed HILL en- 
tropy at least k conditioned onS, denoted H^f^f lx (W\S) > k if there exists a joint 
distribution {X,Y), such thatH x (X\Y) > k ‘andP^^W, S), (X,Y)) < e. 

Intuitively, HILL entropy is as good as average min-entropy for all computa- 
tionally bounded observers. Thus, redefining secure sketches using HILL en- 
tropy is a natural relaxation of the original information-theoretic definition; in 
particular, the sketch-and-extract construction in Lemma [T] would yield pseudo- 
random outputs if the secure sketch ensured high HILL entropy. We will con- 
sider secure sketches that retain relaxed HILL entropy: that is, we say that 
(SS, Rec) is a HILL-entropy (M ,m, rh, t) secure sketch that is (e, s sec )-hard with 
error 6 if it satisfies Definition [2[ with the security requirement replaced by 
F e HI 8 L s L ; c rlx (W| ss (W)) > to. 

Unfortunately, we will show below that such a secure sketch implies an er- 
ror correcting code with approximately 2 m points that can correct t random 
errors (see PH Lemma C.l] for a similar bound on information- theoretic secure 
sketches). For the Hamming metric, our result essentially matches the bound on 
information-theoretic secure sketches of PH Proposition 8.2]. In fact, we show 
that, for the Hamming metric, HILL-entropy secure sketches imply information- 
theoretic ones with similar parameters, and, therefore, the HILL relaxation gives 
no advantage. 

The intuition for building error-correcting codes from HILL-entropy secure 
sketches is as follows. In order to have ( 1U | S S ( IU ) ) > rh, there must 

be a distribution X, Y such that Hqo (X\Y) > rh and (X, Y) is computationally 
indistinguishable from {W, SS(IU)). Sample a sketch s <— SS(W). We know that 
SS followed by Rec likely succeeds on W\s (i.e., Rec ('«;', s) = w with high prob- 
ability for w <— W\s and w' <— Consider the following experiment: 1) 

sample y <— Y, 2) draw x <— X\y and 3) x' <— B t (x) . By indistinguishability, 

1 The security in [15j Section 8.2] is expressed in terms of entropy of the error rate; 

recall that logBt(-) « H q (t/n), where n is the number of symbols, q is the alphabet 

size, and H q is the q - ary entropy function. 
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Rec (a:', y) = x with high probability. This means we can construct a large set 
C from the support of X\ y. C will be an error correcting code and Rec an effi- 
cient decoder. We can then use standard arguments to turn this code into an 
information theoretic sketch. 

To make this intuition precise, we need an additional technical condition: 
sampling a random neighbor of a point is efficient. 

Definition 6. We say a metric space (M, dis) is (s ne i g h , t) -neighborhood sam- 
plable if there exists a randomized circuit Neigh of size s ne i g h that for all t' < t, 
Neigh(u’, t') outputs a random point at distance t' ofw. 

We review the definition of a Shannon code m- 

Definition 7. Let C be a set over space M. We say that C is an ( t , e) -Shannon 
code if there exists an efficient procedure Rec such that for all t' < t and for 
all c £ C, Pr[Rec(Neigh(c, tff) ^ c] < e. To distinguish it from the average-error 
Shannon code defined below, we will sometimes call it a maximal-error Shannon 
code. 

This is a slightly stronger formulation than usual, in that for every size t' < t 
we require the code to correct t' random errorfl Shannon codes work for all 
codewords. We can also consider a formulation that works for an “average” 
codeword. 

Definition 8. Let C be a distribution over space M.. We say that C is an (i, e)- 
average error Shannon code if there exists an efficient procedure Rec such that 
for alltf <t Pr c ^_c*[Rec(Neigh(c, if)) ^ c] < e. 

An average error Shannon code is one whose average probability of error is 
bounded by e. See P3 Pages 192-194] for definitions of average and maximal 
error probability. An average-error Shannon code is convertible to a maximal- 
error Shannon code with a small loss. We use the following pruning argument 
from [T3 Pages 202-204] (we provide a proof in the full version [15] 1: 

Lemma 2. Let C be a ( t , e)-average error Shannon code with recovery procedure 
Rec such that H 00 (C) > k. There is a set C with |C'| > 2 fc_1 that is a (t, 2e)- 
(maximal error) Shannon code with recovery procedure Rec. 

We can now formalize the intuition above and show that a sketch that retains 
m-bits of relaxed HILL entropy implies a good error correcting code with nearly 
2 m points (proof in the full version of this work [15] h 

Theorem 1. Let (A4, dis) be a (s ne i g h,t) -neighborhood samplable metric space. 
Let (SS, Rec) be a HILL-entropy (M. m, m , t)-secure sketch that is (e, s sec )-secure 

2 In the standard formulation, the code must correct a random error of size up to t, 
which may not imply that it can correct a random error of a much smaller size t' , 
because the volume of the ball of size t' may be negligible compared to the volume 
of the ball of size t. For codes that are monotone (if decoding succeeds on a set of 
errors, it succeeds on all subsets), these formulations are equivalent. However, we 
work with an arbitrary recover functionality that is not necessarily monotone. 
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with error 8. Let s rec denote the size of the circuit that computes Rec. If s sec > 
( t(s ne i g h + &rec}}) then there exists a value s and a set C with \C\ > 2 m_2 that is 
a (t, 4(e + 18}}- Shannon code with recovery procedure Rec(-,s). 

For the Hamming metric, any Shannon code (as defined in Definition [7]) can be 
converted into an information-theoretic secure sketch (as described in [TS1 Sec- 
tion 8.2] and references therein). The idea is to use the code offset construction, 
and convert worst-case errors to random errors by randomizing the order of the 
symbols of w first, via a randomly chosen permutation tt (which becomes part of 
the sketch and is applied to w' during Rec). The formal statement of this result 
can be expressed in the following Lemma (which is implicit in [T51 Section 8.2]). 


Lemma 3. For an alphabet Z, let C over Z n be a ( t , 8) Shannon code. Then 
there exists a (Z n ,m,m — (nlog|.Z| — log|C|),i) secure sketch with error 8 for 
the Hamming metric over Z n . 

Putting together Theorem [T] and Lemma |TT| gives us the negative result for the 
Hamming metric: a HILL-entropy secure sketch (for the uniform distribution) 
implies an information-theoretic one with similar parameters: 

Corollary 1. Let Z be an alphabet. Let (SS 7 , Rec') be an (e, s sec ) -HILL-entropy 
( Z n , n log \Z\,m,t} -secure sketch with error 8 for the Hamming metric over Z n , 
with Rec 7 of circuit size s rec . If s sec > t(s rec + nlog|^|), then there exists a 
(Z n ,nlog\Z\,fh — 2,t) (information-theoretic) secure sketch with error 4(e+t8). 
Note. In Corollary [T] we make no claim about the efficiency of the resulting 
(SS, Rec), because the proof of Theorem Q] is not constructive. 

Corollary [T] extends to non-uniform distributions: if there exists a distribution 
whose HILL sketch retains rh bits of entropy, then for all distributions W, there 
is an information theoretic sketch that retains H 00 (IF) — (nlog \Z\ — m) — 2 bits 
of entropy. 

3.2 Bounds on Secure Sketches Using Unpredictability Entropy 

In the previous section, we showed that any sketch that retained HILL entropy 
could be transformed into an information theoretic sketch. However, HILL en- 
tropy is a strong notion. In this section, we therefore ask whether it is useful to 
consider a sketch that satisfies a minimal requirement: the value of the input is 
computationally hard to guess given the sketch. We begin by recalling the defi- 
nition of conditional unpredictability entropy [22l Definition 7], which captures 
the notion of “hard to guess” (we relax the definition slightly, similarly to the 
relaxation of HILL entropy described in the previous section) . 

Definition 9. Let (W, S ) be a pair of random variables. W has relaxed unpre- 
dictability entropy at least k conditioned on S, denoted by Hff£(f c X (W|5) > k, 
if there exists a pair of distributions ( X , Y ) such that ^ s «“ {{W, S), ( X , Y}} < e, 
and for all circuits I of size s sec , 

Pr [1(Y) = X] < 2~ k . 
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A pair of procedures (SS, Rec) is a unpredictability- entropy ( M.,m,fh,t ) secure 
sketch that is (e, s sec )-hard with error S if it satisfies Definition [5J with the secu- 
rity requirement replaced by ( IF | S S ( IF ) ) > m. Note this notion is quite 

natural: combining such a secure sketch in a sketch-and-extract construction 
of Lemma Q] with a particular type of extractor (called a reconstructive extrac- 
tor |5), would yield a computational fuzzy extractor (per [55[ Lemma 6]). 

Unfortunately, the conditional unpredictability entropy fh must decrease as t 
increases, as the following theorem states. (The proof of the theorem, generalized 
to more metric spaces, is in the full version [T8].) 

Theorem 2. Let Z be an alphabet. Let (SS, Rec) be an unpredictability- entropy 
(Z n ,m,rh,t) -secure sketch that is (e, s sec ) -secure with error 5, ifs sec > f(|Rec| + 
nlog \Z\), then rh<n log \Z\ — log |.B t (-)| + log(l — e — t5). 

In particular, if the input is uniform, the entropy loss is about log |U t (-)| . As 
mentioned at the beginning of Section [31 essentially the same entropy loss can 
be achieved with information-theoretic secure sketches, by using the randomized 
code-offset construction. However, it is conceivable that unpredictability entropy 
secure sketches could achieve lower entropy loss with greater efficiency for some 
parameter settings. 


3.3 Avoiding Sketch Entropy Upper Bounds 

The lower bounds of Corollary [T] and Theorem [5] are strongest for high entropy 
sources. This is necessary, if a source contains only codewords (of an error cor- 
recting code), no sketch is needed, and thus there is no (computational) entropy 
loss. This same situation occurs when considering lower bounds for information- 
theoretic sketches [T51 Appendix C] . 

Both of lower bounds arise because Rec must function as an error-correcting 
code for many points of any indistinguishable distribution. It may be possible 
to avoid these bounds if Rec outputs a fresh random variably. Such an algo- 
rithm is called a computational fuzzy conductor. See [5S] for the definition of a 
fuzzy conductor. To the best of our knowledge, a computational fuzzy conductor 
has not been defined in the literature, the natural definition is to replace the 
pseudorandomness condition in Definition 0] with a HILL entropy requirement. 

Our construction (in Section [4]) has pseudorandom output and immediately 
satisfies definition of a computational fuzzy extractor (Definition [4]) . It may be 
possible to achieve significantly better parameters with a construction that is a 
computational fuzzy conductor (but not a computational fuzzy extractor) and 
then applying an extractor. We leave this as an open problem. 


3 If some efficient algorithm can take the output of Rec and efficiently transform it back 
to the source W, the bounds of Corollary [1] and Theorem [2] both apply. This means 
that we need to consider constructions that are hard to invert (either information- 
theoretically or computationally). 
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4 Computational Fuzzy Extractor Based on LWE 


In this section we describe our main construction. Security of our construction 
depends on the source W. We first consider a uniform source W; we consider 
other distributions in Section [5] Our construction uses the code-offset construc- 
tion [23], pa Section 5] instantiated with a random linear code over a finite field 
F g . Let Decode* be an algorithm that decodes a random linear code with at most 
t errors (we will present such an algorithm later, in Section 14.21) . 


Construction 1 . Let n be a security parameter and let m>n. Let q be a prime. 
Define Gen, Rep as follows: 


Gen 


Rep 


1. Input: w <— W (where W is some 
distribution over¥ ™). 

2. Sample A e F™ x ”,x e F£ uni- 
formly. 

3. Compute p = (A, Ax + w), 

i" = x l,...,n/2- 

4- Output ( r,p ). 


1. Input: ( w',p ) (where the Hamming 
distance between w' and w is at 
most t). 

2. Parse p as (A, c); let b = c — w' . 

3. Let x = Decode* (A, b) 

4 ■ Output r = X\^_^ n / 2 - 


Intuitively, security comes from the computational hardness of decoding ran- 
dom linear codes with a high number of errors (introduced by w). In fact, we 
know that decoding a random linear code is NP-hard [6] ; however, this statement 
is not sufficient for our security goal, which is to show 

S V ^((X 1 _ n/2 ,P),(U nmoKq ,P)) < e. 

Furthermore, this construction is only useful if Decode* can be efficiently imple- 
mented. 

The rest of this section is devoted to making these intuitive statements precise. 
We describe the LWE problem and the security of our construction in Section 14.11 
We describe one possible polynomial-time Decode* (which corrects more errors 
than is possible by exhaustive search) in Section 14.21 In Section 14.31 we describe 
parameter settings that allow us to extract as many bits as the input entropy, 
resulting in a lossless construction. In Section POl we compare Construction [T] 
to using a sketch-and-extract approach (Lemma [T]) instantiated with a compu- 
tational extractor. 


4.1 Security of Construction Q] 

The LWE problem was introduced by Regev [30131] as a generalization of “learn- 
ing parity with noise.” For a complete description of the LWE problem and 
related lattices problems (which we do not define here) see [30] • We now recall 
the decisional version of the problem. 
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Definition 10 (Decisional LWE). Let n be a security parameter. Let m = 
m(ri) = poly(n) be an integer and q = q(n) = poly(n) be a primfit ]. Let A be 
the uniform distribution over¥™ xn , X be the uniform distribution over F£ and 
X be an arbitrary distribution on F™ . The decisional version of the LWE problem, 
denoted dist-LWE njm) g iX) is to distinguish the distribution (A,AJ + x) from the 
uniform distribution over (¥™ Xn ,¥™). 

We say that dist-LWE„ imi9iX is (e, s sec )-secure if no (probabilistic) distin- 
guisher of size s sec can distinguish the LWE instances from uniform except with 
probability e. If for any s sec = poly(n), there exists e = ngl(n) such that 
dist-LWE^m^x is (e, s sec )-secure, then we say it is secure. 

Regev [301 and Peikert [25j show that dist-LWE nim)9)X is secure when the distri- 
bution x of errors is Gaussian, as follows. Let & p be the discretized Gaussian 
distribution with variance (pq) 2 /2n, where p £ (0, 1) with pq > 2 y/n. If GAPSVP 
and SIVP are hard to approximate (on lattices of dimension n) within polyno- 
mial factors for quantum algorithms, then dist-LWE n m q is secure. (A recent 
result of Brakerski et al. [8] shows security of LWE based on hardness of approx- 
imating lattices problems for classical algorithms. We have not considered how 
this result can be integrated into our analysis.) 

The above formulation of LWE requires the error term to come from the dis- 
cretized Gaussian distribution, which makes it difficult to use it for constructing 
fuzzy extractors (because using w and w' to sample Gaussian distributions will 
increase the distance between the error terms and/or reduce their entropy). For- 
tunately, recent work Dottling and Muller- Quade m shows the security of LWE, 
under the same assumptions, when errors come from the uniform distribution 
over a small interval!. This allows us to directly encode w as the error term in an 
LWE problem by splitting it into m blocks. The size of these blocks is dictated 
by the following result of Dottling and Miiller-Quade: 

Lemma 4. fl7\ Corollary 1] Let n be a security parameter. Let q = q(n) = 
poly(n) be a prime and m = m(n) = poly(n) be an integer with m > 3 n. Let 
a £ (0, 1) be an arbitrarily small constant and let p = p{n) £ (0, 1/10) be such 
that pq > 2n 1 / 2+CT m. If the approximate decision-version of the shortest vector 
problem (GAPSVP) and the shortest independent vectors problem (SIVP) are 
hard within a factor of 0(n 1+a m/ p) for quantum algorithms in the worst case, 
then, for x the uniform distribution over [—pq,pq] m , dist-LWE^^^^ is secure. 

To extract pseudorandom bits, we use a result of Akavia, Goldwasser, and 
Vaikuntanathan jTj to show that X has simultaneously many hardcore bits. The 
result says that if dist-LWE(„_/. imj9jX ) is secure then any k variables of X in a 
dist-LWE(„ )mi?iX ) instance are hardcore. We state their result for a general error 
distribution (noting that their proof does not depend on the error distribution) : 


4 Unlike in common formulations of LWE, where q can be any integer, we need q to 
be prime for decoding. 

5 Micciancio and Peikert provide a similar formulation in [27] . The result Dottling and 
Miiller-Quade provides better parameters for our setting. 
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Lemma 5. JTJ Lemma 2] If dist-LWE( n _fe )77l)g)X ) is (e, s sec ) secure, then 

6 v °*ec' ((Xi,..., fe , A, AX + X ), ([7, A, AX + x)) < e , 

where U denotes the uniform distribution over A denotes the uniform distri- 
bution over F™ x ”, X denotes the uniform distribution overF ”, Xi,...,* denote 
the first k coordinates of x, and s' sec « s sec — n 3 • 

The security of Construction [T] follows from Lemmas 0] and [5] when parameters 
are set appropriately (see Theorem [3]), because we use the hardcore bits of X as 
our key. 


4.2 Efficiency of Construction Q] 

Construction Q] is useful only if Decode t can be efficiently implemented. We need 
a decoding algorithm for a random linear code with t errors that runs in poly- 
nomial time. We present a simple Decodet that runs in polynomial time and can 
correct correcting O(logn) errors (note that this corresponds to a superpolyno- 
mial number of possible error patterns). This algorithm is a proof of concept, 
and neither the algorithm nor its analysis have been optimized for constants. An 
improved decoding algorithm can replace our algorithm, which will increase our 
correcting capability and improve Construction [TJ 

Construction 2. We consider a setting of ( n,m,q,x ) where m > 3 n. We de- 
scribe Decode t : 

1. Input A, b = Ax + w — w' 

2. Randomly select rows without replacement ii, ...,i 2 n [l,m]. 

3. Restrict A, b to rows ii,...,i 2 ni denote these Aj li ... i , 2ll ,bi lj ... j i 2n . 

f. Find n rows o/ A^.....^ that are linearly independent. If no such rows exist, 
output T and stop. 

5. Denote by A',b' the restriction of Aj 1) ... ) i 2ri ,bj 1] ... ) j 2n (respectively) to these 
rows. Compute x' = (A') -1 ^. 

6. If b — Ax' has more than t nonzero coordinates, go to step (2). 

7. Output x' . 

Each step is computable in time 0(n 3 ). For Decodet to be efficient, we need t 
to be small enough so that with probability at least pol ^ w ^ , none of the 2 n rows 
selected in step 2 have errors (i.e., so that w and w' agree on those rows). If this 
happens, and Aj li ... i j 2n has rank n (which is highly likely), then x' = x, and the 
algorithm terminates. However, we also need to ensure correctness: we need to 
make sure that if x' x, we detect it in step 6. This detection will happen if 
b — Ax' = A(x — x') + (w — w') has more than t nonzero coordinates. It suffices 
to ensure that A(x — x') has at least 2t + 1 nonzero coordinates (because at 
most t of those can be zeroed out by w — w'), which happens whenever the code 
generated by A has distance 2i + 1. 


Computational Fuzzy Extractors 187 


Setting t = 0(~ log n) is sufficient to ensure efficiency. Random linear codes 
have distance at least Offff logn) with probability 1 — e~°^ (the exact state- 
ment is in Corollary [5]) , so this also ensures correctness. The formal statement 
is below (proof in the full version of this work my 

Lemma 6 (Efficiency of Decode t when t < d(m/n — 2) logn). Let d be a 

positive constant and assume that dis(W, W') < t where t < d(^ — 2) logn. 
Then Decode t runs in expected time 0(n 4d+3 ) operations in¥ q (this expectation 
is over the choice of random coins of Decode t , regardless of the input, as long as 
d\s(w,w') < t). It outputs X with probability 1 — e~° ^ (this probability is over 
the choice of the random matrix A and random choices made by Decodet )■ 

4.3 Lossless Computational Fuzzy Extractor 

We now state a setting of parameters that yields a lossless construction. The 
intuition is as follows. We are splitting our source into m blocks each of size 
log pq (from Lemma 01) for a total input entropy of m log pq. Our key is derived 
from hardcore bits of X: Xl, and is of size k log q (from Lemma [5]). Thus, 
to achieve a lossless construction we need klogq = m log pq. In other words, 
in order to decode a meaningful number of errors, the vector w is of higher 
dimension than the vector X, but each coordinate of w is sampled using fewer 
bits than each coordinate of X. Thus, by increasing the size of q (while keeping 
pq fixed) we can set k log q = rn log pq. yielding a key of the same size as our 
source. The formal statement is below. 

Theorem 3. Let n be a security parameter and let the number of errors t = 
clogn for some positive constant c. Let d be a positive constant (giving us a 
tradeoff between running time of Rep and \w\). Consider the Hamming metric 
over the alphabet Z = [— 2 6-1 , 2 6-1 j, where b = log2(c/d + 2)n 2 = O(logn). 
Let W be uniform over M = Z m , where m = ( c/d+ 2 )n = 0(n). If GAPSVP 
and SIVP are hard to approximate within polynomial factors using quantum 
algorithms, then there is a setting of q = poly (n) such that for any polyno- 
mial s sec = poly (n) there exists e = ngl(n) such that the following holds: 
Construction 0] is a (M, W, m log \Z\,t)- computational fuzzy extractor that is 
(e, s sec )-hard with error 6 = e~ n ( n \ The generate procedure Gen takes 0(n 2 ) op- 
erations overF q , and the reproduce procedure Rep takes expected time 0(n id+3 ) 
operations over F g . 

Proof. Security follows by combining Lemmas 0] and 01 efficiency follows by 
Lemma O For a detailed explanation of the various parameters and constraints 
see the full version of this work 1111- 

Theorem 0] shows that a computational fuzzy extractor can be built without 
incurring any entropy loss. We can essentially think of A X+W as an encryption 
of X that where decryption works from any close W' . 
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4.4 Comparison with Computational-Extractor-Based 
Constructions 

As mentioned in the introduction, an alternative approach to building a computa- 
tional fuzzy extractor is to use a computational extractor (e.g. , [2613113] ) in place of 
the information-theoretic extractor in the sketch-and-extract construction. We will 
call this approach sketch- and- comp- extract. (A simple example of a computational 
extractor is a pseudorandom generator applied to the output of an information- 
theoretic extractor; note that LWE- based pseudorandom generators exist [2].) 

This approach (specifically, its analysis via Lemma [1]) works as long as the 
amount of entropy mofw conditioned on the sketch s remains high enough to 
run a computational extractor. However, as discussed in Section [31 hi decreases 
with the error parameter t due to coding bounds, and it is conceivable that, if 
W has barely enough entropy to begin with, it will have too little entropy left 
to run a computational extractor once s is known. 

In contrast, our approach does not require the entropy of w conditioned on 
p = (A, AX + w) to be high enough for a computational extractor. Instead, we 
require that w is not computationally recoverable given p. This requirement is 
weaker — in particular, in our construction, w may have no information-theoretic 
entropy conditioned on p. The key difference in our approach is that instead of 
extracting from w, we hide secret randomness using w. Computational extractors 
are not allowed to have private randomness [251 Definition 3] . 

The main advantage of our analysis (instead of sketch-and-comp-extract) is 
that security need not depend on the error-tolerance t. In our construction, 
the error-tolerance depends only on the best available decoding algorithm for 
random linear codes, because decoding algorithms will not reach the information- 
theoretic decoding radius. 

Unfortunately, LWE parameter sizes require relatively long w. Therefore, in 
practice, sketch-then-comp-extract will beat our construction if the computa- 
tional extractor is instantiated efficiently based on assumptions other than LWE 
(for example, a cryptographic hash function for an extractor and a block cipher 
for a PRG). However, we believe that our conceptual framework can lead to 
better constructions. Of particular interest are other codes that are easy to de- 
code up to t errors but become computationally hard as the number of errors 
increases. 

To summarize, the advantage of Construction [T] is that the security of our 
construction does not depend on the decoding radius t. The disadvantages of 
Construction [T] are that it supports a limited number of errors and only a uni- 
formly distributed source. We begin to address this second problem in the next 
section. 

5 Computational Fuzzy Extractor for Nonuniform 
Sources 

While showing the security of Construction [T] for arbitrary high-min-entropy 
distributions is an open problem, in this section we show it for a particular class 
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of distributions called symbol-fixing. First we recall the notion of a symbol fixing 
source (from [2H Definition 2.3]): 

Definition 11. Let W = (W±, ...,W m +a) be a distribution where each Wi takes 
values over an alphabet Z. We say that it is a (to + a, m, \Z\) symbol fixing 
source if for a indices ii,...,i a , the symbols Wi a are fixed, and the remaining 
m symbols are chosen uniformly at random. Note that H 00 (W) = mlog|.Z|. 

Symbol-fixing sources are a very structured class of distributions. However, ex- 
tending Construction [T] to such a class is not obvious. Although symbol- fixing 
sources are deterministically extractible m , we cannot first run a deterministic 
extractor before using Construction [TJ This is because we need to preserve dis- 
tance between w and w' and an extractor must not preserve distance between 
input points. We present an alternative approach, showing security of LWE di- 
rectly with symbol-fixing sources. 

The following theorem states the main technical result of this section, which 
is of potential interest outside our specific setting. The result is that dist-LWE 
with symbol-fixing sources is implied by standard dist-LWE (but for n and m 
reduced by the amount of fixed symbols). 

Theorem 4. Let n be a security parameter, m, a be polynomial in n, and q = 
poly (n) be a prime and ft e Z+ be such that q~P = ngl(n). Let U denote the 
uniform distribution over Z m for an alphabet Z c ¥ q , and let W denote an 
(m + a,m,\Z\) symbol fixing source over Z m+a . If dist-LWE„ )mi9i c; is secure, 
then dist-LWE„_|_ ai _|_^ im+ai9i ^ is also secure. 

Theorem ED also holds for an arbitrary error distribution (not just uniform error) 
in the following sense. Let yf be an arbitrary error distribution. Define x as the 
distribution where m dimensions are sampled according to yf and the remaining 
dimensions have some fixed error. Then, security of dist-LWE n m>9>x / implies se- 
curity of dist-LWE n+ai+/ 3 iTO+ai?iX . We prove this stronger version of the theorem 
in the full version of this work [18] . 

The intuition for this result is as follows. Providing a single sample with 
no error “fixes” at most a single variable. Thus, if there are significantly more 
variables than samples with no error, search LWE should still be hard. We are 
able to show a stronger result that dist-LWE is still hard. The nontrivial part of 
the reduction is using the additional a+0 variables to “explain” a random value 
for the last a samples, without knowing the other variables. The 3 parameter is 
the slack needed to ensure that the “free” variables have influence on the last a 
samples. A similar theorem for the case of a single fixed dimension was shown 
in concurrent work by Brakerski et al. [8] Lemma 4.3]. The proof techniques of 
Brakerski et al. can be extended to our setting with multiple fixed dimensions, 
improving the parameters of Theorem 0] (specifically, removing the need for 0). 

Theorem H] allows us to construct a lossless computational fuzzy extractor 
from block- fixing sources: 

Theorem 5. Let n be a security parameter and let t = c log n for some positive 
constant c. Let d < c be a positive constant and consider the Hamming metric 
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over the alphabet Z = [— 2 6-1 , 2 6-1 ], where b fts log2(c/d + 2)n 2 = O(logn). 
Let M. = Z m+a where m = (c/d, + 2 )n = 0{ri) and a < n/3. Let W be 
the class of all (m + a, m,\Z\)- symbol fixing sources. If GAPSVP and SIVP 
are hard to approximate within polynomial factors using quantum algorithms, 
then there is a setting of q = poly(n) such that for any polynomial s sec = 
poly(n) there exists e = ngl(n) such that the following holds: Construction [I] 
is a (Ad, W ,mlog\Z\, t) -computational fuzzy extractor that is (e, s sec )-hard with 
error 6 = e~ n<Jl > . The generate procedure Gen takes 0(n 2 ) operations over ¥ q , 
and the reproduce procedure Rep takes expected time 0(n id+3 log n) operations 
over Fq. 

Proof. Security follows by Lemmas 0] and [5] and Theorem 0] . Efficiency follows 
by Lemma 01 For a more detailed explanation of parameters see the full version 
of this work [TH] • 
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A Properties of Random Linear Codes 

For efficient decoding of Construction [TJ we need the LWE instance to have high 
distance with overwhelming probability. We will use the q- ary entropy function, 
denoted H q (x) and defined as H q (x ) = xlog q (q—l)—xlog q x—(l — x) log ? (l— x). 
Note that H 2 (x ) = —a; log a; — (1 — a;) log(l — a;). In the region [0, for any value 
q' > q, H q '(x) < H q (x). The following theorem is standard in coding theory: 

Theorem 6. Theorem 8] For prime q, 6 G [0, 1 — 1/q), 0 < e < 1 — H q {8) 

and sufficiently large to, the following holds for n = [(1 — H q (8) — e)m] . If 
A e F™ x " is drawn uniformly at random, then the linear code with A as a 
generator matrix has rate at least (1 — H q (S) — e) and relative distance at least 
6 with probability at least 1 — e ~ n ( m ) . 
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Our setting is the case where m = poly(n) > 2 n and 6 = Oilogn / n) . This 
setting of parameters satisfies Theorem [TTJ 

Corollary 2. Let n be a parameter and let m = poly(n) > 2n. Let q be a prime 
and t = O(^logn). For large enough values of n, when A e F™ x " is drawn 
uniformly, the code generated by A has distance at least t with probability at 
least 1 - e~ n ^ > 1 - e ~ n( - n \ 


Proof. Let c be some constant. Let S = r/m = c *" s n . We show the corollary 
for the case when m = 2n (increasing the size of m only increases the relative 
distance). It suffices to show that for sufficiently large n, there exists e > 0 where 
1 — Hg( cl ° s " ) — e = 1/2 or equivalently that H q { cXc ^ n ) < 1/2 as then setting 
e = 1/2 — #g( cl ° gn ) satisfies Theorem [BJ For sufficiently large n: 

_ c log n < \j2, so we can work with the binary entropy function H^. 

- < .1 < 1/2 and thus < H q (. 1). 

Putting these statements together, for large enough n, H q ( c *° g n ) < H q (. 1) < 
< 1/2 as desired. This completes the proof. 
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Abstract. We introduce explicit schemes based on the polarization phe- 
nomenon for the task of secret-key agreement from common information 
and one-way public communication as well as for the task of private chan- 
nel coding. Our protocols are distinct from previously known schemes in 
that they combine two practically relevant properties: they achieve the 
ultimate rate — defined with respect to a strong secrecy condition — and 
their complexity is essentially linear in the blocklength. However, we are 
not able to give an efficient algorithm for code construction. 

Keywords: One-way secret-key agreement, private channel coding, 
one-way secret-key rate, secrecy capacity, wiretap channel scenario, more 
capable, less noisy, degraded, polarization phenomenon, polar codes, 
practically efficient, strongly secure. 


1 Introduction 

Consider two parties, Alice and Bob, connected by an authentic but otherwise 
fully insecure communication channel. It has been shown that without having 
access to additional resources, it is impossible for them to communicate privately, 
with respect to an information-theoretic privacy condition m- In particular 
they are unable to generate an unconditionally secure key with which to encrypt 
messages transmitted over the public channel. However, if Alice and Bob have 
access to correlated randomness about which an adversary (Eve) has only partial 
knowledge, the situation changes completely: information-theoretically secure 
secret-key agreement and private communication become possible. Alternatively, 
if Alice and Bob are connected by a noisy discrete memoryless channel (DMC) 
to which Eve has only limited access — the so-called wiretap channel scenario of 
Wyner [3], Csiszar and Korner [I], and Maurer [5] — private communication is 
again possible. 

In this paper, we present explicit schemes for efficient one-way secret-key 
agreement from common randomness and for private channel coding in the wire- 
tap channel scenario. As discussed in Section 1231 we improve previous work that 
requires extra assumptions about the structure of the wiretap channel or/and 
do not achieve strong secrecy. Our schemes are based on polar codes, a family of 
capacity-achieving linear codes, introduced by Arikan [5], that can be encoded 
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and decoded efficiently. Previous work in a quantum setup [B] already implies 
that practically efficient one-way secret-key agreement and private channel cod- 
ing in a classical setup is possible, where a practically efficient scheme is one 
whose computational complexity is essentially linear in the blocklength. The 
aim of this paper is to explain the schemes in detail and give a purely classi- 
cal proof that the schemes are reliable, secure, practically efficient and achieve 
optimal rates. 

This paper is structured as follows. Section [2] introduces the problems of per- 
forming one-way secret-key agreement and private channel coding. We summa- 
rize known and new results about the optimal rates for these two problems for 
different wiretap channel scenarios. In Section [31 we explain how to obtain one- 
way secret-key agreement that is practically efficient, strongly secure, reliable, 
and achieves the one-way secret-key rate. However, we are not able to give an 
efficient algorithm for code construction, as discussed in Section 13.31 Section [4] 
introduces a similar scheme that can be used for strongly secure private chan- 
nel coding at the secrecy capacity. Finally we conclude in Section |5] and state 
an open problem that is of interest in the setup of this paper as well as in the 
quantum mechanical scenario introduced in [B] . 

2 Background and Contributions 

2.1 Notation and Definitions 

Let [&] = {1, . . . , k} for k e Z+. For ieZ| and I c: [fc] we have x[I] = \xi : 
i e I], x 1 2 ' m [aq, . . . ,Xi ] and Xj = [xj , . . . ,Xi\ for j P i. The set A c denotes 
the complement of the set A. The uniform distribution on an arbitrary random 
variable X is denoted by Px- For distributions P and Q over the same alpha- 
bet X , the variational distance is defined by S(P,Q ) := \P(x) - Q(x)\. 

Let X and Y be two (possibly correlated) random variables. We use standard 
information theoretic notation, such as H(X) for the (Shannon) entropy of X, 
H(X,Y) for the joint entropy of (X. Y), H{X\Y ) for the conditional entropy of 
X given Y, and I(X: Y) for the mutual information between X and The 
notation X-o-Y—o-Z means that the random variables X, Y, Z form a Markov 
chain in the given order. 

In this setup we consider a discrete memoryless wiretap channel (DM-WTC) 
W : X — > y x Z, which is characterized by its transition probability distribution 
We assume that the variable X belongs to Alice, Y to Bob and Z to 

Eve. 

According to Korner and Marton [5] , a DM-WTC W : X — > y x Z is termed 
more capable if I(X\Y) Js I(X\ Z) for every possible distribution on X. The 

1 These quantities are properly defined in [7]. 

2 Recall that a discrete channel is defined as a system consisting of an input alphabet 
(here A), an output alphabet (here y x Z) and a transition probability distribution 
(here Py,z\x) between the input and the output. A channel is said to be memoryless 
if the probability distribution of the output depends only on the input at that time 
and is conditionally independent of previous channel inputs or outputs. 
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channel W is termed less noisy if I(U;Y) ^ I(U\ Z) for every possible distribu- 
tion on (U, X) where U has finite support and U-o-X-o-(Y, Z ) form a Markov 
chain. If X-o-Y-o-Z form a Markov chain, W is called degraded^ It has been 
shown [5] that being more capable is a strictly weaker condition than being less 
noisy, which is a strictly weaker condition than being degraded. Hence, having a 
DM-WTC W which is degraded implies that W is less noisy, which again implies 
that W is also more capable. 

2.2 Polarization Phenomenon 

Let X N be a vector whose entries are i.i.d. Bernoulli (p) distributed for p e [0, 1] 
and N = 2" where n e Z+. Then define U N = GnX n , where £?jv denotes the 
polarization (or polar) transform which can be represented by the matrix 



( 1 ) 


where A® k denotes the kth Kronecker power of an arbitrary matrix A. Note that 
it turns out that Gn is its own inverse. Furthermore, let Y N = \N n X n , where 
\N N denotes N independent uses of a DMC W : X — > y. For e e (0, 1) we may 
define the two sets 


n?{X Y) {i 6 [N] : H^U^ 1 ,Y N ) > 1 - e} and 
Df(X| Y) :={ie [IV] : H (U i \U i ~ 1 ,Y N ) < e} . 


( 2 ) 

(3) 


The former consists of outputs Uj which are essentially uniformly random, even 
given all previous outputs U as well as Y N , while the latter set consists 
of the essentially deterministic outputs. The polarization phenomenon is that 
essentially all outputs are in one of these two subsets, and their sizes are given 
by the conditional entropy of the input X given Y. 

Theorem 1 (Polarization Phenomenon [5ll9] h For any e 6 (0, 1) 


\11?{X\Y)\ = NH(X\Y) - o(N ) and 
\D?(X\Y)\ = N (1 — H(X\Y)) — o(N). 


(4) 

(5) 


Based on this theorem it is possible to construct a family of linear error cor- 
recting codes, called polar codes. The logical bits are encoded into the Ui for 
i e £>f(X|Y), whereas the inputs to U t for i e Vf(X\Y) c are fixed 0 It has 
been shown that polar codes have several desirable attributes [5110111112] : they 
provably achieve the capacity of any DMC; they have an encoding and decoding 

3 To call a DM-WTC W:l->lx2 more capable is an abbreviation meaning that 
the main channel Wi : X — * y is more capable than the eavesdropping channel 
W 2 : X — > Z. The same convention is used for less noisy and degraded DM-WTCs. 

4 These are the so-called frozen bits. 
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complexity that is essentially linear in the blocklength N- the error probability 
decays exponentially in the square root of the blocklength. 

Non-binary random variables can be represented by a sequence of correlated 
binary random variables, which are then encoded separately. Correlated se- 
quences of binary random variables may be polarized using a multilevel construc- 
tion, as shown in [TD]1! Given M i.i.d. instances of a sequence X = X( 2) , 

. . . , X( K) ) and possibly a correlated random variable Y. the basic idea is to first 
polarize relative to Y M , then treat X^Y M as side information in polariz- 
ing X^, and so on. More precisely, defining = GmX 'M for j = 
we may define the random and deterministic sets for each j as 


% -lb" 






[ u u> ,\ u m 

- vM 

i 

AN 

* 

} , and (6) 


,X m ,Y) 




{i e [M] : H{ 


- vM 

V/ 

(7) 


In principle we could choose different e parameters for each j, but this will not 
be necessary here. Now, Theorem Q] applies to the random and deterministic 
sets for every j. The sets K?(X\ Y) = {K" )(I w [I (j _i ), . . . , X {1) ,Y)}f =1 and 
Df(X\ Y) = {V^X^IX^,. . • ,X ( 1 ) ,V)}f =1 have sizes given by 


\nf{x\Y)\ = f; \n™ u) (x (j) \x u _ lh . . (8) 

3 = 1 

** £ MH(X u) \X {1) ,...,X ij _ t) ,Y)-°(M) (9) 

Ji=d 

fc MH(X\Y) - o(KM), (10) 

and 

\V^(X\Y)\ = f; \v^ j) (X {j) \X u _ 1) ,...,X (1) ,Y)\ (11) 

3 = 1 

Ht M (! - H (*U) \X(D>- ■ ~ o{M) (12) 

■§*4 

m,M(K - H(X\Y )) - o(KM). (13) 


In the following we will make use of both the polarization phenomenon in its 
original form, Theorem [lj and the multilevel extension. To simplify the presen- 
tation, we denote by the K parallel applications of Gm to the K random 
variables X^ . 

5 An alternative approach is given in |13I14| , where the polarization phenomenon has 
been generalized for arbitrary finite fields. We will however focus on the multilevel 
construction in this paper. 
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2.3 One-Way Secret-Key Agreement 

At the start of the one-way secret-key agreement protocol, Alice, Bob, and Eve 
share N = 2”, n e Z+ i.i.d. copies (X N , Y N , Z N ) of a triple of correlated 
random variables ( X , Y, Z ) which take values in discrete but otherwise arbitrary 
alphabets X, y, 

Alice starts the protocol by performing an operation r A ■ X N — > ( S J ,C ) 
on X N which outputs both her secret key S A e S J and an additional random 
variable C e C which she transmits to Bob over an public but noiseless public 
channel. Bob then performs an operation tb : (y N ,C) — > S J on Y N and the 
information C he received from Alice to obtain a vector e S J ; his secret key. 
The secret-key thus produced should be reliable, i.e., satisfy the 

reliability condition: ^lim PrfS 1 ^ ^ £g] = 0, (14) 

and secure, i.e., satisfy the 

(strong) secrecy condition: ^lim |Pg./ ) .ziv iC - P s j x 0, (15) 

where P s j denotes the uniform distribution on random variable S A . 

Historically, secrecy was first characterized by a (weak) secrecy condition of 
the form 

lim a ±l(S J A -,Z N ,C)=0. (16) 

Maurer and Wolf showed that (fTBl) is not a sufficient secrecy criterion [15116) 

and introduced the strong secrecy condition 

Kml{S J A] Z N ,C) = 0, (17) 

where in addition it is required that the key is uniformly distributed, i.e., 

Yimj(p si ,P s ^ = 0. (18) 

In recent years, the strong secrecy condition (fTTl) . (fT51) has often been replaced by 
(fI5[) . since (half) the L\ distance directly bounds the probability of distinguishing 
the actual key produced by the protocol with an ideal key. This operational 
interpretation is particularly helpful in the finite blocklength regime. In the limit 
N — * oo, the two secrecy conditions (fl~5l) and (fl7l) are equivalent, which can be 
shown using Pinskser’s and Fano’s inequalities. 

Since having weak secrecy is not sufficient, we will only consider strong se- 
crecy in this paper. It has been proven that each secret-key agreement protocol 
which achieves weak secrecy can be transformed into a strongly secure protocol 
m ■ However, it is not clear whether the resulting protocol is guaranteed to be 
practically efficient. 

6 The correlation of the random variables ( X , Y, Z) is described by their joint proba- 
bility distribution Px,y,z ■ 
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For one-way communication, Csiszar and Korner [3] and later Ahlswede and 
Csiszar [17] showed that the optimal rate R := limjy^oo -P of generating a secret 
key satisfying (fT4jl and (|T7l) . called the secret-key rate S^(X; Y\Z). is charac- 
terized by a closed single-letter formula. 

Theorem 2 (One-Way Secret-Key Rate [41117] ) . For triples (X, Y\ Z) de- 
scribed by Px,y,z as explained above, 

( max H(U\Z,V) - H(U\Y,V) 
PU s ’lv^U^X^(Y,Z), (19) 

\V\^\X\,\U\^\X\*. 

The expression for the one-way secret-key rate given in Theorem [5] can be 
simplified if one makes additional assumptions about Px,y,z- 
Corollary 3. ForPx,y,z such that the induced DM-WTC\N described by Py,z\x 
is more capable, 

f max H(X\Z,V) - H(X\Y,V) 

S^(X-,Y\Z)^l 7 x.V^X^{Y,Z), ( 20 ) 

l |V[<|*|. 

Proof. In terms of the mutual information, we have 
H(U\Z,V)-H(U\Y,V) 

■■■■■■■ I(U:Y V') l(U:ZV) (21) 

mI(X,U-,Y\V)-I(X,U;Z\V) - (I(X;Y\U,V) - I(X- Z\U,V)) (22) 
^I(X,U-,Y\V) -I(X,U-,Z\V) (23) 

m : I{X-Y\V)~I{X-Z\V), (24) 

using the chain rule, the more capable condition, and the Markov chain prop- 
erties, respectively. Thus, the maximum in S^>(X;Y\Z) can be achieved when 
omitting U. ■■’jp? 

Corollary 4. ForPx,y,z such that the induced DM-WTC\N described by Py,z\x 
is less noisy, 

S^(X-Y\Z)=H(X\Z)-H(X\Y). (25) 

Proof. Since W being less noisy implies W being more capable, we know that 
the one-way secret key rate is given by (1201) . Using the chain rule we obtain 

H(X\Z,V) -H(X\Y,V) 

mI(X-Y\V)-I(X-Z\V) (26) 

N I(X, V; Y) - I(X, V; Z) - 7(U; Y) + J(V; Z) (27) 

= I(X ; Y) - I(X ; Z) - (I(V\ Y) - I{V\ Z)) (28) 

^I(X-Y)-I(X-,Z). (29) 

Equation (11251) follows from the chain rule and the Markov chain condition. The 
inequality uses the assumption of being less noisy. □ 
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Note that (1551) is also equal to the one-way secret-key rate for the case where 
W is degraded, as this implies W being less noisy. The proof of Theorem [5] does 
not imply that there exists an efficient one-way secret-key agreement protocol. 
A computationally efficient scheme was constructed in [TS] , but is not known to 
be practically efficient 0 

For key agreement with two-way communication, no formula comparable to 
(UPTi for the optimal rate is known. However, it has been shown that the two- 
way secret-key rate is strictly larger than the one-way secret-key rate. It is also 
known that the intrinsic information I{X\Y{Z) := minp z , |z I(X;Y\Z') is an 
upper bound on S(X; Y\Z), but is not tight 1 171 191201 . 


2.4 Private Channel Coding 

Private channel coding over a wiretap channel is closely related to the task of one- 
way secret-key agreement from common randomness (cf. Section 12751) . Here Alice 
would like to transmit a message M J e M J privately to Bob. The messages can 
be distributed according to some arbitrary distribution P m j. To do so, she first 
encodes the message by computing X N = enc (M J ) for some encoding function 
enc : M. J — > X N and then sends X N over the wiretap channel to Bob (and 
to Eve), which is represented by ( Y N ,Z N ) = \N n X n . Bob next decodes the 
received message to obtain a guess for Alice’s message M J = dec (Y N ) for some 
decoding function dec : y N — > M J . As in secret-key agreement, the private 
channel coding scheme should be reliable, i.e., satisfy the 

reliability condition: lim Pr|^M J / M J j = 0, for all M J e A4 J (30) 

and (strongly) secure, i.e., satisfy the 

(strong) secrecy condition: Jim \\Pm j ,z N ,C ~ Pm j x (31) 

The variable C denotes any additional information made public by the protocol. 

As mentioned in Section [2711 in the limit J — » oo this strong secrecy condition 
is equivalent to the historically older (strong) secrecy condition 

Jim I{M j -Z n ,C) = 0. (32) 

The highest achievable rate R := limjv_>cc 77 fulfilling (1501) and (1511) is called the 
secrecy capacity. 

Csiszar and Korner showed |4j Corollary 2] that there exists a single-letter 
formula for the secrecy capacityO 

7 As defined in Section [T] we call a scheme practically efficient if its computational 
complexity is essentially linear in the blocklength. 

8 Maurer and Wolf showed that the single-letter formula remains valid considering 
strong secrecy [16] . 
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Theorem 5 (Secrecy Capacity [4j). For an arbitrary DM WTC W as intro- 
duced above, 

{ max H(y\Z) — H(V\Y) 

s '.t.V-o-X-o-(Y,Z), ( 33 ) 

This expression can be simplified using additional assumptions about W. 

Corollary 6 (0). // W is more capable, 

C s = maxH(X\Z)-H(X\Y). (34) 

Px 

Proof. A proof can be found in [5] or [2D Section 22.1], □ 

2.5 Previous Work and Our Contributions 

In Section [3] we present a one-way secret-key agreement scheme based on polar 
codes that achieves the secret-key rate, is strongly secure, reliable and whose 
implementation is practically efficient, with complexity 0(N log IV) for block- 
length N. Our protocol improves previous efficient secret-key constructions [22] ■ 
where only weak secrecy could be proven and where the eavesdropper has no 
prior knowledge and/or degradability assumptions are required. Our protocol 
also improves a very recent efficient secret-key construction m, which requires 
to have a small amount of shared key between Alice and Bob and only works for 
binary degraded (symmetric) discrete memoryless sources. However, we note that 
a possible drawback of our scheme compared to [22] is that its code construction 
may be more difficult. 

In Section [4] we introduce a coding scheme based on polar codes that prov- 
ably achieves the secrecy capacity for arbitrary discrete memoryless wiretap 
channels. We show that the complexity of the encoding and decoding opera- 
tions is 0(N log N) for blocklength N . Our scheme improves previous work on 
practically efficient private channel coding at the optimal rate m , where only 
weak secrecy could be proven under the additional assumption that the channel 
W is degraded]! Recently, Bellare et al. introduced a polynomial-time coding 
scheme that is strongly secure and achieves the secrecy capacity for binary sym- 
metric wiretap channels mE Several other constructions of private channel 
coding schemes have been reported [26127128] , but all achieve only weak secrecy. 
Very recently, §a§oglu and Vardy introduced a new polar coding scheme that 

9 Note that Mahdavifar and Vardy showed that their scheme achieves strong secrecy 
if the channel to Eve (induced from W) is noiseless. Otherwise their scheme is not 
provably reliable m- 

10 They claim that their scheme works for a large class of wiretap channels. However, 
this class has not been characterized precisely so far. It is therefore not clear whether 
their scheme requires for example degradability assumptions. Note that to obtain 
strong secrecy for an arbitrarily distributed message, it is required that the wiretap 
channel is symmetric [25] Lemma 14]. 
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can be used for private channel coding being strongly secure m ■ However, it 
still requires the assumption of having a degraded wiretap channel which we 
do not need for our scheme. In EQJ, an explicit construction that achieves the 
secrecy capacity for wiretap channel coding is introduced, but efficiency is not 
considered. 

The tasks of one-way secret-key agreement and private channel coding ex- 
plained in the previous two subsections are closely related. Maurer showed how 
a one-way secret-key agreement can be derived from a private channel coding 
scenario [2], More precisely, he showed how to obtain the common randomness 
needed for one-way secret-key agreement by constructing a “virtual” degraded 
wiretap channel from Alice to Bob. This approach can be used to obtain the 
one-way secret-key rate from the secrecy capacity result in the wiretap channel 
scenario EU Section 22.4.3], One of the main advantages of the two schemes in- 
troduced in this paper is that they are both practically efficient. However, even 
given a practically efficient private coding scheme, it is not known that Maurer’s 
construction will yield a practically efficient scheme for secret key agreement. For 
this reason, as well as simplicity of presentation, we treat the one-way secret-key 
agreement and the private channel coding problem separately in the two sections 
to follow. 

3 One-Way Secret-Key Agreement Scheme 

Our key agreement protocol is a concatenation of two subprotocols, an inner 
and an outer layer, as depicted in Figure Q] The protocol operates on blocks 
of N i.i.d. triples ( X , Y. Z ) , which are divided into M sub-blocks of size L for 
input to the inner layer. At the outer layer, we use the multi-level construction 
introduced in Section [2721 In the following we assume X = {0, 1}, which however 
is only for convenience; the techniques of m and m can be used to generalize 
the schemes to arbitrary alphabets X. 

The task of the inner layer is to perform information reconciliation and that 
of the outer layer is to perform privacy amplification. Information reconciliation 
refers to the process of carrying out error correction to ensure that Alice and 
Bob obtain a shared bit string, and here we only allow communication from 
Alice to Bob for this purpose. On the other hand, privacy amplification refers to 
the process of distilling from Alice’s and Bob’s shared bit string a smaller set of 
bits whose correlation with the information available to Eve is below a desired 
threshold. 

Each subprotocol in our scheme is based on the polarization phenomenon. 
For information reconciliation of Alice’s random variable X L relative to Bob’s 
information Y L , Alice applies a polar transformation to X L and forwards the 
bits of the complement of the deterministic set Df i (X\Y) to Bob over a public 
channel, which enables him to recover X L using the standard polar decoder [5] . 
Her remaining information is then fed into a multilevel polar transformation and 
the bits of the random set are kept as the secret key. 
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Let us now define the protocol more precisely. For L = 2 e , £ e Z + , let V L : =- 
GlX l where Gl is as defined in ©■ For ei > 0, we define 

E k :=V^(X\Y), ( 35 ) 

with K := \V^(X\Y)\. Then, let T (j) = V L [£ K ]j for j = 1, ... ,K and C (j) « 
V L [£jc] j for j = l, .... L — K so that T = (T)^, ■ ■ ■ , T(k)) and C = (C(i), • • • , 
G(l-k))- For e 2 > 0 and UK = GmT& for | = l, . . . K (or, more briefly, 
U M = G*T M ), we define 

Fj-.= n™{T\CZ L ), ( 36 ) 

with J:= \RM(T\CZ l )\. 


Protocol 1: One-way secret-key agreement 


Given: Index sets £k and Tj (code construction) 

Notation: Alice’s input: x N e iff (a realization of X N ) 

Bob’s / Eve’s input: (y N ,z N ) (realizations of Y N and Z N ) 

Alice’s output: s A 
Bob’s output: 

Step 1: Alice computes v^f = GlxI+i for all i e {0, L, 2 L , . . . , (M — 1 )L}. 

Step 2: Alice computes U = v 1 ^[£k\ for all i e {0, L, 2 L , . . . , ( M — 1)L}. 

Step 3: Alice sends c, = [£j<] for all i e {0, L, 2 L, . . . , ( M — 1 )L} over a pub- 

lic channel to Bob. 

Step 4: Alice computes u M = G*Mt M and obtains s J A = u M \fFj\ 

Step 5: Bob applies the standard polar decoder [5IT2| to to obtain 

and ti = wjji [£ k ], for i e {0, L, 2L, . . . , (M — 1 )L}. 

Step 6: Bob computes u M = G% t M and obtains = u m \Tj]. 


3.1 Rate, Reliability, Secrecy, and Efficiency 

Theorem 7. Protocol [I] allows Alice and Bob to generate a secret key S A re- 
spectively S B using public one-way communication C M such that for any /3 < 


Reliability: 

Pr[Si ^ S J B \~o(M2- L ^ 

(37) 

Secrecy: 

I \Psi,z",c - Psi x p z*,c\ t = 0(^2-^) 

(38) 

Rate: 

R ■■= if = H{X\Z) - —H(V L [£jc\]Z L ) - 

(39) 

All operations by both parties can be performed in 0(N log N) steps. 



11 The expression u M [J~j\ is an abuse of notation, as Tj is not a subset of [M]. The 
expression should be understood to be the union of the random bits of u ^ , for all 
j = 1, . . . , K, as in the definition of 1Z ^ ( T\CZ L ). 


204 J. M. Renes, R. Renner, and D. Sutter 



Fig. 1. The secret-key agreement scheme for the setup N = 8, L = 4, M = 2, K = 2, 
and J — 2. We consider a source that produces N i.i.d. copies (X N ,Y N , Z N ) of a 
triple of correlated random variables ( X , Y, Z). Alice performs the operation ta , sends 
(V L [£k]) m over a public channel to Bob and obtains S J A , her secret key. Bob then 
performs the operation tb which results in his secret key Sb . 


Proof. The reliability of Alice’s and Bob’s key follows from the standard polar 
decoder error probability and the union bound. Each instance of the decoding 
algorithm employed by Bob has an error probability which scales as 0( 2~ L ) for 
any /3 < \ 0; application of the union bound gives the prefactor M. Since Gl 
as defined in JTJ| is its own inverse, G ^ is its own inverse as well. 

The rate of the scheme is 


R^Zjp- (40) 

= ~H(V l [£k\\V l [£ c k \,Z l ) - (41) 

^ i (H(V l \Z l ) - H(V l [£‘ k ]\Z l )) - ^ (42) 

- H(X\Z) - jH(V l [E c k ]\Z l ) - (43) 


where (HT1) uses the polarization phenomenon stated in Theorem [TJ 
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To prove the secrecy statement requires more effort. Using Pinsker’s inequality 
we obtain 


s j a ,z n ,c m >P s j a x Pz^,c m ^ < \j^r D (Ps J A ,z N , c m \ p s j a x -Pz^.c") (44) 

= \/ 1J r ( J-H(Si\Z N ,C M )), (45) 

where the last step uses the chain rule for relative entropies and that P s j denotes 
the uniform distribution. We can simplify the conditional entropy expression 
using the chain rule 

h(s j a \z n ,c m ) 

= H(U m [Fj]\Z n ,(V l [£ c k ]) m ) (46) 

# •••> ^i)[^d].^. (v l [£ c k]) m ) (47) 

= S (48) 

* t Z (v L m)") (49) 

>J{ 1-enJ, (50) 

where the first inequality uses the fact that that conditioning cannot increase 
the entropy and the second inequality follows by the definition of Fj. Recall 
that we are using the notation introduced in Section 12.21 For Fj as defined in 
(1551) . we have Fj m {F^}^ =1 where F^ = 1Z^ (Tyj [Ty_i), . . . ,T^,C,Z L ). 
The polarization phenomenon, Theorem 0 implies J = 0(N), which together 
with (H51) proves the secrecy statement of Theorem [TJ since — Q(2~ Nf> ) for 
any /3 < ~. 

It remains to show that the computational complexity of the scheme is 
0(N log N). Alice performs the operation Gl in the first layer M times, each 
requiring 0(L log L) steps [5]. In the second layer she performs G^ r , or K paral- 
lel instances of Gm, requiring 0(KM log M) total steps. From the polarization 
phenomenon, we have K = O(L), and thus the complexity of Alice’s operations 
is not worse than 0(N log N) . Bob runs M standard polar decoders which can be 
done in 0(ML log L) complexity [5112] . Bob next performs the polar transform 
whose complexity is not worse than 0(N log N) as justified above. Thus, 
the complexity of Bob’s operations is also not worse than 0(N log N). □ 

In principle, the two parameters L and M can be chosen freely. However, to 
maintain the reliability of the scheme (cf. (|571) h M may not grow exponentially 
fast in L. A reasonable choice would be to have both parameters scale compa- 
rably fast, i.e., ^ = 0(1). 
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Corollary 8. The rate of Protocol^ given in Theorem^ can 

be bounded as 

R > max jo, H(X\Z) — H(X\Y) - . 

(51) 

Proof. According to (fT31) the rate of Protocol Q] is 


R = H(X\Z) - \h{V l [£^\\Z l ) - &- 

(52) 

^max|o,R(A|Z)-^i-^J 

(53) 

= max jo, H(X\Z) — H(X\Y) - , 

(54) 


where (1511) uses the polarization phenomenon stated in Theorem [TJ □ 


3.2 Achieving the Secret-Key Rate of a Given Distribution 

Theorem [7] together with Corollaries |4| and [8] immediately imply that Protocol [1] 
achieves the secret-key rate 5_ > (A; Y\Z) if Px.y.z is such that the induced DM 
WTP W described by Py,z\x is less noisy. If we can solve the optimization 
problem (fTOl) . i.e., find the optimal auxiliary random variables V and U, our 
one-way secret-key agreement scheme can achieve S~+(X \Y\Z) for a general 
setup. We then make V public, replace X by U and run Protocol [lj Note that 
finding the optimal random variables V and U might be difficult. It has been 
shown that for certain distributions the optimal random variables V and U can 
be found analytically [15] . 

An open problem discussed in Section [5] addresses the question if Protocol Q] 
can achieve a rate that is strictly larger than max {0, H(X\Z) — H(X\Y)} if 
nothing about the optimal auxiliary random variables V and U is known, i.e., if 
we run the protocol directly for X without making V public. 

3.3 Code Construction 

To construct the code the index sets £k and Pj need to be determined. The set 
£k can be computed approximately with a linear-time algorithm introduced in 
m, given the distributions Px and P Y \x- Alternatively, Tal and Vardy’s older 
algorithm [33] and its adaption to the asymmetric setup m can be used. 

To approximately compute the outer index set Tj requires more effort. In 
principle, we can again use the above algorithms, which require a description 
of the “super-source" seen by the outer layer, i.e., the source which outputs 
the triple of random variables (V l \£k\, ( Y L , (•^ L , V L [#§-]))• However, 

its alphabet size is exponential in L, and thus such a direct approach will not 
be efficient in the overall blocklength N. Nonetheless, due to the structure of 
the inner layer, it is perhaps possible that the method of approximation by 
limiting the alphabet size [33132] can be extended to this case. In particular, 
a recursive construction motivated by the decoding operation introduced in [3] 
could potentially lead to an efficient computation of the index set Tj. 


Efficient One-Way Secret-Key Agreement and Private Channel Coding 207 


4 Private Channel Coding Scheme 

Our private channel coding scheme is a simple modification of the secret key 
agreement protocol of the previous section. Again it consists of two layers, an 
inner layer which ensures transmitted messages can be reliably decoded by the 
intended receiver, and an outer layer which guarantees privacy from the unin- 
tended receiver. The basic idea is to simply run the key agreement scheme in 
reverse, inputting messages to the protocol where secret key bits would be out- 
put in key agreement. The immediate problem in doing so is that key agreement 
also produces outputs besides the secret key, so the procedure is not immediately 
reversible. To overcome this problem, the encoding operations here simulate the 
random variables output in the key agreement protocol, and then perform the 
polar transformations G ^ and Gl in reverse 0 

The scheme is visualized in Figure [2] and described in detail in Protocol [U 
Not explicitly shown is the simulation of the bits U m [Fj\ at the outer layer 
and the bits V L [£f C \ at the inner layer. The outer layer, whose simulated bits 
are nearly deterministic, makes use of the method described in [331 Definition 
1], while the inner layer, whose bits are nearly uniformly-distributed, follows 
[T21 Section 4]. Both proceed by successively sampling from the individual bit 
distributions given all previous values in the particular block, i.e., constructing 
Vj by sampling from P v .\y j-i . These distributions can be efficiently constructed, 
as described in Section 14.31 

Note that a public channel is used to communicate the information reconcili- 
ation information to Bob, enabling reliable decoding. However, it is possible to 
dispense with the public channel and still achieve the same rate and efficiency 
properties, as will be discussed in Section R~31 

In the following we assume that the message M J to be transmitted is uni- 
formly distributed over the message set M. '¥*■ {0, 1} J . As mentioned in Sec- 
tion 12.41 it may be desirable to have a private coding scheme that works for an 
arbitrarily distributed message. This can be achieved by assuming that the wire- 
tap channel W is symmetric — more precisely, by assuming that the two channels 
Wi : X — » ^ and W 2 : X — » Z induced by W are symmetric. We can de- 
fine a super-channel W' : T — > y L x Z L x C which consists of an inner encoding 
block and L basic channels W. The super-channel W' again induces two channels 
\N[ : T — > y L xC and \N 2 : T — > Z L xC. Ankan showed that Wi respectively W 2 
being symmetric implies that \N[ respectively \N' 2 is symmetric [5[ Proposition 
13]. ft has been shown in [24| Proposition 3] that for symmetric channels polar 
codes remain reliable for an arbitrary distribution of the message bits. We thus 
conclude that if Wi is assumed to be symmetric, our coding scheme remains reli- 
able for arbitrarily distributed messages. Assuming having a symmetric channel 
W 2 implies that \N 2 is symmetric which proves that our scheme is strongly secure 
for arbitrarily distributed messages!^! 


12 As it happens, Gl is its own inverse. 

13 This can be seen easily by the strong secrecy condition given in (ITTIl using that W' 2 
is symmetric. 
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Protocol 2: Private channel coding 


Given: Index sets £k and J~ j (code construction! 14 

Notation: Message to be transmitted: m J 




Transmis.: 
Inner dec.: 


Outer dec.: 


Let u M [Tv] = m J and u M [Tv] = r KM 3 where r KM 3 is (ran- 
domly) generated as explained in [33] Definition 1]. Let t M = G%u: M . 
For all i e {0, L , . . . , L(M — 1)}, Alice does the following: let v\Xi[£k\ 
= f(j/z,)+i and vlXX [£Jr] = s \%i~ K where s \ is (randomly) gen- 
erated as explained in [T5] Section 4]. Send C (i / K)+1 := s l i Xi~ K over 
a public channel to Bob. Finally, compute = GlvIXi- 
{y N ,z N ) = \N n x n 

Bob uses the standard decoder |5I12| with inputs C^/l )+ i and yfXi 
to obtain vlfi’ an d hence %/£)+ 1 = i [£kt], for each 
i e {0, L, . . . , L(M — 1)}. 

Bob computes u M = Gm 'l m and outputs a guess for the sent message 
rh 3 = u M [Pj]. 


4.1 Rate, Reliability, Secrecy, and Efficiency 
Corollary 9. For any <\, Protocol® satisfies 

Reliability: Pv[m j ^ (55) 

Secrecy: \\Pmj,z”,c ~ p m-j x a O^VN2~ J ^~^ (56) 

Rate: R = H{X\Z) - ±H(V L [£ C K ]\Z L ) - ^ (57) 

and its computational complexity is 0(N log N). 

Proof. Recall that the idea of the private channel coding scheme is to run Proto- 
colQ]backwards. Since Protocol[2]simulates the nearly deterministic bits U m [Fj] 
at the outer encoder as described in [33] Definition 1] and the almost random 
bits V l \£k\ at the inner encoder as explained in [T5] Section 4], it follows that 
for large values of L and M the private channel coding scheme approximates 
the one-way secret-key scheme setuuF^l i.e., limjv^oo ${Pt m ■, P(v L [£ K ]) M ') = 0 
and limi-.oo 6 (Px L , P x l) = 0, where P x l denotes the distribution of the vector 
X L which is sent over the wiretap channel W and P*> denotes the distribution 
of Alice’s random variable X L in the one-way secret-key agreement setup. We 

14 By the code construction the channel input distribution Px is defined. Px should 
be chosen such that it maximizes the scheme’s rate. 

15 Again an abuse of notation. See the Footnote [TT| of Protocol [T| 

16 This approximation can be made arbitrarily precise for sufficiently large values of L 
and M. 
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Fig. 2. The private channel coding scheme for the setup N = 8, L = 4, M = 2, 
K = 2, and J = 2. The message M J is first sent through an outer encoder which 
adds some bits (simulated as explained in [T5] Section 4]) and applies the polarization 
transform G^. The output T M — %T(i), .... T (K f) M is then encoded a second time 
by M independent identical blocks. Note that each block again adds redundancy (as 
explained in [34] Definition 1]) before applying the polarization transform Gl- Each 
inner encoding block sends the frozen bits over a public channel to Bob. Note that 
this extra public communication can be avoided as justified in Section l4~51 The output 
X N is then sent over N copies of the wiretap channel W to Bob. Bob then applies a 
decoding operation as in the key agreement scheme, Section [3] 

thus can use the decoder introduced in [5] to decode the inner layer. Since we 
are using M identical independent inner decoding blocks, by the union bound 
we obtain the desired reliability condition. The secrecy and rate statement are 
immediate consequences from Theorem 0 'fit 

As mentioned after Theorem [7] to ensure reliability of the protocol, M may 
not grow exponentially fast in L. 

Corollary 10. The rate of Protocol^ given in Corollary^ can be bounded as 
max jo,U(X|Z) - H(X\Y) - (58) 

Proof. The proof is identical to the proof of Corollary [8] □ 


4.2 Achieving the Secrecy Capacity of a Wiretap Channel 

Corollaries [5] and [TU] immediately imply that our private channel coding scheme 
achieves the secrecy capacity for the setup where W is more capable. If we can 
find the optimal auxiliary random variable V in (1551) , Protocol [5] can achieve 
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the secrecy capacity for a general wiretap channel scenario. We define a super- 
channel W:V->yx2 which includes the random variable X and the wiretap 
channel W. The super-channel W is characterized by its transition probability 
distribution Py,z\v where V is the optimal random variable solving (1331) . The 
private channel coding scheme is then applied to the super-channel, achieving 
the secrecy capacity. Note that finding the optimal random variable V might be 
difficult. 

In Section [5] we discuss the question if it is possible that Protocol[2]achieves a 
rate that is strictly larger than max {0, maxp x H(X\Z) — H(X\Y ) }, if nothing 
about the optimal auxiliary random variable V is known. 

4.3 Code Construction and Public Channel Communication 

To construct the code the index sets £k and Tj as defined in (1551) and (1551) need 
to be computed. This can be done as explained in Section 13.31 One first chooses 
a distribution Px that maximizes the scheme’s rate given in (1571) . before looking 
for a code that defines this distribution P x . 

We next explain how the communication C M e C M from Alice to Bob can be 
reduced such that it does not affect the rate, i.e., we show that we can choose 
\C\ = o(L). Recall that we defined the index set £k :=V^(X\Y) in (1551) . Let Q := 
n^{X\Y) using the noation introduced in d2J) and I := [L]\(£ K u Q) = £k\G- 
As explained in Section 12.21 Q consists of the outputs Vj which are essentially 
uniformly random, even given all previous outputs as well as Y L , where 

V L m GlX l . The index set I consists of the outputs Vj which are neither 
essentially uniformly random nor essentially deterministic given -1 and Y L . 
The polarization phenomenon stated in Theorem [T] ensures that this set is small, 
i.e., that \X\ = o(L). Since the bits of Q are almost uniformly distributed, we can 
fix these bits independently of the message — as part of the code construction — 
without affecting the reliability of the scheme for large blocklengths0 We thus 
only need to communicate the bits belonging to the index set X. 

We can send the bits belonging to X over a seperate public noiseless channel. 
Alternatively, we could send them over the wiretap channel W that we are using 
for private channel coding. However since W is assumed to be noisy and it is 
essential that the bits in X are recieved by Bob without any errors, we need to 
protect them using an error correcting code. To not destroy the essentially linear 
computational complexity of our scheme, the code needs to have an encoder and 
decoder that are practically efficient. Since \X\ = o(L), we can use any error 
correcting code that has a non-vanishing rate. For symmetric binary DMCs, 
polar coding can be used to transmit reliably an arbitrarily distributed message 
[551 Proposition 3] . We can therefore symmetrize our wiretap channel W and use 
polar codes to transmit the bits in ZPl 

17 Recall that we choose et wO for any /3 < such that for L — > oo the index 

set Q contains only uniformly distributed bits. 

18 Note that the symmetrization of the channel will reduce its rate which however does 

not matter as we need a non-vanishing rate only. 
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As the reliability of the scheme is the average over the possible assignments of 
the random bits belonging to X (or even £Jr)> at least one choice must be as good 
as the average, meaning a reliable, efficient, and deterministic scheme must exist. 
However, it might be computationally hard to find this choice. This means that 
there exists a scheme for private channel coding (having the properties given in 
Corollary [HJ) that does not require any extra communication from Alice to Bob, 
i.e., C = 0, however its code construction might be computationally inefficient. 

5 Conclusion and Open Problems 

We have constructed practically efficient protocols (with complexity essentially 
linear in the blocklength) for one-way secret-key agreement from correlated 
randomness and for private channel coding over discrete memoryless wiretap 
channels. Each protocol achieves the corresponding optimal rate. Compared to 
previous methods, we do not require any degradability assumptions and achieve 
strong (rather than weak) secrecy. Our scheme is formulated for arbitrary dis- 
crete memoryless wiretap channels. Using ideas of §a§oglu et al. [TU] the two 
protocols presented in this paper can also be used for wiretap channels with 
continuous input alphabets. 

Finally, we want to describe an open problem which addresses the question of 
whether rates beyond max{0, H(X\Z) — H(X\Y )} can be achieved by our key 
agreement scheme, even if the optimal auxiliary random variables V and U are 
not given, i.e., if we run Protocol [T] directly for X (instead of U) without making 
V public. The question could also be formulated in the private coding scenario, 
whether rates beyond max {0, ma,x Px H(X\Z) — H(X\Y ) } are possible, but as 
a positive answer in the former context implies a positive answer in the latter, 
we shall restrict attention to the key agreement scenario for simplicity. 
Question 1 Does for some distributions Px,y,z the rate of Protocol^ satisfy 
R>ma,z{0,H(X\Z)-H(X\Y)}, for AT-+«? (59) 

An equivalent formulation of this question is whether inequality (|53l) is always 
tight for large enough N, i.e., 

Question V Is it possible that 

Jim jH(V L [Efc \ | Z L ) < Jim ^ \£ C K \ , for R > 0? (60) 

From the polarization phenomenon stated in Theorem[l]we obtain lim.L_ >00 j- \£ C K \ 
= H(X\Y), which together with (|60l) would imply that R > max {0, H(X\Z) - 
H(X\Y )} for N — » oo is possible. Relation ( RTUT) can only be satisfied if the high- 
entropy set with respect to Bob’s side information, i.e., the set £k, is not always 
a high-entropy set with respect to Eve’s side information. Thus, the question of 
rates in the key agreement protocol is closely related to fundamental structural 
properties of the polarization phenomenon. 

A positive answer to Question[T]implies that we can send quantum information 
reliable over a quantum channel at a rate that is beyond the coherent information 
using the scheme introduced in [B]. 
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Abstract. In 2009, Abdalla et al. proposed a reasonably practical pass- 
word-authenticated key exchange (PAKE) secure against adaptive adver- 
saries in the universal composability (UC) framework. It exploited the 
Canetti-Fischlin methodology for commitments and the Cramer-Shoup 
smooth projective hash functions (SPHFs), following the Gennaro-Lindell 
approach for PAKE. In this paper, we revisit the notion of non-interactive 
commitments, with a new formalism that implies UC security. In ad- 
dition, we provide a quite efficient instantiation. We then extend our 
formalism to SPHF-friendly commitments. We thereafter show that it 
allows a blackbox application to one-round PAKE and oblivious trans- 
fer (OT), still secure in the UC framework against adaptive adversaries, 
assuming reliable erasures and a single global common reference string, 
even for multiple sessions. Our instantiations are more efficient than the 
Abdalla et al. PAKE in Crypto 2009 and the recent OT protocol proposed 
by Choi et al. in PKC 2013. Furthermore, the new PAKE instantiation 
is the first one-round scheme achieving UC security against adaptive ad- 
versaries. 


1 Introduction 

Commitment schemes are one of the most fundamental primitives in cryp- 
tography, serving as a building block for many cryptographic applications such 
as zero- knowledge proofs [55] and secure multi-party computation [5T| . In a typ- 
ical commitment scheme, there are two main phases. In a commit phase, the 
committer computes a commitment C for some message x and sends it to the 
receiver. Then, in an opening phase, the committer releases some information 6 
to the receiver which allows the latter to verify that C was indeed a commitment 
of x. To be useful in practice, a commitment scheme should satisfy two basic 
security properties. The first one is hiding, which informally guarantees that no 
information about x is leaked through the commitment C. The second one is 
binding, which guarantees that the committer cannot generate a commitment C 
that can be successfully opened to two different messages. 

Smooth Projective Hash Functions (SPHFs) were introduced by Cramer 
and Shoup m as a means to design chosen-ciphertext-secure public-key en- 
cryption schemes. In addition to providing a more intuitive abstraction for their 
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original public-key encryption scheme in [15] . the notion of SPHF also enabled 
new efficient instantiations of their scheme under different complexity assump- 
tions, such as quadratic residuosity. Due to its usefulness, the notion of SPHF 
was later extended to several other contexts, such as password- authenticated key 
exchange (PAKE) [50], oblivious transfer (OT) [27115] . and blind signatures [716] . 

Password- Authenticated Key Exchange (PAKE) protocols were proposed 
in 1992 by Bellovin and Merritt [5] where authentication is done using a sim- 
ple password, possibly drawn from a small space subject to exhaustive search. 
Since then, many schemes have been proposed and studied. SPHFs have been 
extensively used, starting with the work of Gennaro and Lindell [5U] which gen- 
eralized an earlier construction by Katz, Ostrovsky, and Yung (KOY) [55], and 
followed by several other works [1112] . More recently, a variant of SPHFs pro- 
posed by Katz and Vaikuntanathan even allowed the construction of one-round 
PAKE schemes [3016] . 

The first ideal functionality for PAKE protocols in the UC framework [8112] 
was proposed by Canetti et al. DU* who showed how a simple variant of the 
Gennaro-Lindell methodology |20| could lead to a secure protocol. Though quite 
efficient, their protocol was not known to be secure against adaptive adversaries, 
that are capable of corrupting players at any time, and learn their internal states. 
The first ones to propose an adaptively secure PAKE in the UC framework were 
Barak et al. [3] using general techniques from multi-party computation (MPC). 
Though conceptually simple, their solution results in quite inefficient schemes. 

The first reasonably practical adaptively secure PAKE was proposed by Ab- 
dalla et al. [2], following the Gennaro-Lindell methodology with the Canetti- 
Fischlin commitment m- They had to build a complex SPHF to handle the 
verification of such a commitment. Thus, the communication complexity was 
high and the protocol required four rounds. No better adaptively secure scheme 
has been proposed so far. 

Oblivious Transfer (OT) was introduced in 1981 by Rabin (33] as a way to 
allow a receiver to get exactly one out of k messages sent by another party, the 
sender. In these schemes, the receiver should be oblivious to the other values, 
and the sender should be oblivious to which value was received. Since then, 
several instantiations and optimizations of such protocols have appeared in the 
literature, including proposals in the UC framework [31113] , 

More recently, new instantiations have been proposed, trying to reach round- 
optimality [55], or low communication costs [33]. The l-out-of-2 OT scheme by 
Choi et al. US based on the DDH assumption seems to be the most efficient 
one among those that are secure against adaptive corruptions in the CRS model 
with erasures. But it does not scale to 1-out-of-fc OT, for k > 2. 


1.1 Properties of Commitment Schemes 

Basic Properties. In addition to the binding and hiding properties, certain 
applications may require additional properties from a commitment scheme. One 
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such property is equivocability [2], which guarantees that a commitment C can 
be opened in more than a single way when in possession of a certain trapdoor 
information. Another one is extractability , which allows the computation of the 
message x committed in C when in possession of a certain trapdoor information. 
Yet another property that may also be useful for cryptographic applications is 
non-malleability |18| . which ensures that the receiver of a unopened commit- 
ment C for a message x cannot generate a commitment for a message that is 
related to x. 

Though commitment schemes satisfying stronger properties such as 
non-malleability, equivocability, and extractability may be useful for solving spe- 
cific problems, they usually stop short of guaranteeing security when composed 
with arbitrary protocols. To address this problem, Canetti and Fischlin [TO] pro- 
posed an ideal functionality for commitment schemes in the universal compos- 
ability (UC) framework [8] which guarantees all these properties simultaneously 
and remain secure even under concurrent compositions with arbitrary protocols. 
Unfortunately, they also showed that such commitment schemes can only be 
realized if one makes additional setup assumptions, such as the existence of a 
common reference string (CRS) [TO], random oracles [13, or secure hardware 
tokens [28] . 

Equivocable and Extractable Commitments. As the work of Canetti and 
Fischlin [TO], this work also aims to build non-interactive commitment schemes 
which can simultaneously guarantee non-malleability, equivocability, and extract- 
ability properties. To this end, we first define a new notion of commitment 
scheme, called E 2 -commitments, for which there exists an alternative setup algo- 
rithm, whose output is computationally indistinguishable from that of a normal 
setup algorithm and which outputs a common trapdoor that allows for both 
equivocability and extractability: this trapdoor not only allows for the extraction 
of a committed message, but it can also be used to create simulated commitments 
which can be opened to any message. 

To define the security of E 2 -schemes, we first extend the security notions 
of standard equivocable commitments and extractable commitments to the E 2 - 
commitment setting: Since the use of a common trapdoor for equivocability and 
extractability could potentially be exploited by an adversary to break the ex- 
tractability or equivocability properties of an E 2 -commitment scheme, we define 
stronger versions of these notions, which account for the fact that the same 
trapdoor is used for both extractability or equivocability. In particular, in these 
stronger notions, the adversary is given oracle access to the simulated commit- 
ment and extractor algorithms. 

Finally, after defining the security of E 2 -schemes, we further show that these 
schemes remain secure even under arbitrary composition with other crypto- 
graphic protocols. More precisely, we show that any E 2 -commitment scheme 
which meets the strong versions of the equivocability or extraction notions is 
a non-interactive UC-secure (multiple) commitment scheme in the presence of 
adaptive adversaries, assuming reliable erasures and a single global CRS. 
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SPHF-Friendly Commitments. In this work, we are interested in building 
non-interactive Recommitments, to which smooth projective hash functions can 
be efficiently associated. Unfortunately, achieving this goal is not so easy due to 
the equivocability property of E 2 -commitments. To understand why, let X be 
the domain of an SPHF function and let L be some underlying NP language such 
that it is computationally hard to distinguish a random element in L from a ran- 
dom element in X \ L. A key property of these SPHF functions that makes them 
so useful for applications such as PAKE and OT is that, for words C in L, their 
values can be computed using either a secret hashing key hk or a public projected 
key hp together a witness w to the fact that C is indeed in L. A typical example 
of a language in which we are interested is the language L x corresponding to 
the set of elements {C} such that C is a valid commitment of x. Unfortunately, 
when commitments are equivocable, the language L x containing the set of valid 
commitments of x may not be well defined since a commitment C could poten- 
tially be opened to any x. To get around this problem and be able to use SPHFs 
with E 2 -commitments, we show that it suffices for an E 2 -commitment scheme to 
satisfy two properties. The first one is the stronger version of the equivocability 
notion, which guarantees that equivocable commitments are computationally in- 
distinguishable from normal commitments, even when given oracle access to the 
simulated commitment and extractor algorithms. The second one, which is called 
robustness, is new and guarantees that commitments generated by polynomially- 
bounded adversaries are perfectly binding. Finally, we say that a commitment 
scheme is SPHF-friendly if it satisfies both properties and if it admits an SPHF 
on the languages L x . 


1.2 Contributions 

A New SPHF-friendly E 2 -commitment Construction. First, we define the 
notion of SPHF-friendly E 2 -commitment together with an instantiation. The new 
construction, which is called £ 2 C and described in Section [T| is inspired by the 
commitment schemes in [1011312] . Like the construction in [2], it combines a 
variant of the Cramer-Shoup encryption scheme (as an extractable commitment 
scheme) and an equivocable commitment scheme to be able to simultaneously 
achieve both equivocability and extractability. However, unlike the construction 
in 12, we rely on Haralambiev’s perfectly hiding commitment PH Section 4.1.4], 
instead of the Pedersen commitment P2- 

Since the opening value of Haralambiev’s scheme is a group element that 
can be encrypted in one ElGamal-like ciphertext to allow extractability, this 
globally leads to a better communication and computational complexity for the 
commitment. The former is linear in m ■ &, where m is the bit-length of the 
committed value and A, the security parameter. This is significantly better than 
the extractable commitment construction in (2 which was linear in m • ^ 2 , but 
asymptotically worse than the two proposals in m that are linear in A, and 
thus independent of m. However, we point out the latter proposals in pH] are 
not SPHF-friendly since they are not robust. 
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We then show in Theorem 0] that a labeled E 2 -commitment satisfying stronger 
notions of equivocability and extractability is a non-interactive UC-secure com- 
mitment scheme in the presence of adaptive adversaries, assuming reliable era- 
sures and a single global CRS, and we apply this result to our new construction. 
One-Round Adaptively Secure PAKE. Second, we provide a generic con- 
struction of a one-round UC-secure PAKE from any SPHF-friendly commitment. 
The UC-security holds against adaptive adversaries, assuming reliable erasures 
and a single global CRS, as shown in Section El In addition to being the first 
one-round adaptively secure PAKE, our new scheme also enjoys a much better 
communicatiou complexity than previous adaptively secure PAKE schemes. For 
instance, in comparison to the PAKE in [2], which is currently the most efficient 
adaptively secure PAKE, the new scheme gains a factor of it in the overall com- 
munication complexity, where it is the security parameter. However, unlike their 
scheme, our new construction requires pairing-friendly groups. 

Three-round Adaptively Secure 1-out-of-fc OT. Third, we provide a generic 
construction of a three-round UC-secure l-out-of-fc OT from any SPHF-friendly 
commitment. The UC-security holds against adaptive adversaries, assuming reli- 
able erasures and a single global CRS, as shown in Section [71 Besides decreasing 
the total number of rounds with respect to existing OT schemes with similar 
security levels, our resulting protocol also has a better communication complex- 
ity than the best known solution so far m ■ Moreover, our construction is more 
general and provides a solution for l-out-of-/c OT schemes while the solution in 
m only works for k = 2. 

Due to space restrictions, complete proofs and some details were postponed 
to the full version [T] . 

2 Basic Notions for Commitments 

We first review the basic definitions of non-interactive commitments, with some 
examples. Then, we consider the classical additional notions of equivocability 
and extractability. In this paper, the qualities of adversaries will be measured by 
their successes and advantages in certain experiments Exp sec or Exp sec ~ 6 (between 
the cases 6 = 0 and 6=1), denoted Succ sec (M, It) and Adv sec (M, it) respectively, 
while the security of a primitive will be measured by the maximal successes or 
advantages of any adversary running within a time bounded by some t in the 
appropriate experiments, denoted Succ sec (f) and Adv sec (t) respectively. Adver- 
saries can keep state during the different phases. We denote 4- the outcome of 
a probabilistic algorithm or the sampling from a uniform distribution. 


2.1 Non-interactive Labeled Commitments 

A non-interactive labeled commitment scheme C is defined by three algorithms: 

— SetupCom(l^) takes as input the security parameter H and outputs the global 
parameters, passed through the CRS p to all other algorithms; 
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Exp^ id - fe (jt) 

Exp b j nd (£) 

p 4- SetupCom(l' s ) 

p 4- SetupCom(l' R ) 

(£, x 0 , xi, state) 4- A(p) 

{C,£,xo,8 0 ,xi,5i) t-A(p) 

(C, 5) A Cornea*) 

if -iVerCorr/(C', xo, <5o) then return 0 

return „4(state, C) 

if -iVerCom £ ((7, xi, (5i) then return 0 


return xn ^ xi 


Fig. 1. Hiding and Binding Properties 


— Conrr (x) takes as input a label l and a message x, and outputs a pair (C, 5), 
where C is the commitment of x for the label £, and 6 is the correspond- 
ing opening data (a.k.a. decommitment information). This is a probabilistic 
algorithm; 

— VerCorrr(C, x, 8) takes as input a commitment C, a label £, a message x, and 
the opening data 6 and outputs 1 (true) if 8 is a valid opening data for C, x 
and £. It always outputs 0 (false) on x = 1. 

Using the experiments Exp^ d (.£) and Exp^ lnd (it) defined in Figure [TJ one can 
state the basic properties: 

— Correctness: for all correctly generated CRS p, all commitments and opening 
data honestly generated pass the verification VerCom test: for all £, x, if 
( C,S ) A Cornea;), then VerCom e (C,x,8) = 1; 

— Hiding Property: the commitment does not leak any information about the 
committed value. C is said (t, e)-hiding if Adv^ ld (t) < e. 

— Binding Property: no adversary can open a commitment in two different 
ways. C is said (t, e)-binding if Succ£ llld (t) < e. 

Correctness is always perfectly required, and one can also require either the 
binding or the hiding property to be perfect. 

The reader can remark that labels are actually useless in the hiding and 
the binding properties. But they will become useful in E 2 -commitment schemes 
introduced in the next section. This is somehow similar to encryption scheme: 
labels are useless with encryption schemes which are just IND-CPA, but are very 
useful with IND-CCA encryption schemes. 

2.2 Perfectly Binding Commitments: Public-Key Encryption 

To get perfectly binding commitments, classical instantiations are public-key 
encryption schemes, which additionally provide extractability (see below). The 
encryption algorithm is indeed the commitment algorithm, and the random coins 
become the opening data that allow to check the correct procedure of the commit 
phase. The hiding property relies on the indistinguishability (IND-CPA), which is 
computationally achieved, whereas the binding property relies on the correctness 
of the encryption scheme and is perfect. 
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Let us define the ElGamal-based commitment scheme: 

— SetupCom(l^) chooses a cyclic group G of prime order p, g a generator for 
this group and a random scalar z 4- Z p . It sets the CRS p = (G ,g,h = g z )\ 

— Com(M), for M £ G, chooses a random element r 4- Z p and outputs the 
pair (C = (u = g r , e = h r ■ M), 6 = r); 

— VerCom(C = ( u , e ), M, S = r) checks whether C = (u = g r ,e = h r ■ M). 

This commitment scheme is hiding under the DDH assumption and perfectly 
binding. It is even extractable using the decryption key z: M = e/u z . However, 
it is not labeled. The Cramer-Shoup encryption scheme m admits labels and 
is extractable and non-malleable, thanks to the IND-CCA security level. 


2.3 Perfectly Hiding Commitments 

The Pedersen scheme [32] is the most famous perfectly hiding commitment: 
Com(m) = g m h r for a random scalar r 4- Z p and a fixed basis h £ G. The 
binding property relies on the DL assumption. Unfortunately, the opening value 
is the scalar r, which makes it hard to encrypt /decrypt efficiently, as required 
in our construction below. Haralambiev [Ml Section 4.1.4] recently proposed a 
new commitment scheme, called TC4 (without label), with a group element as 
opening value: 

— SetupCom(l^) chooses an asymmetric pairing-friendly setting (Gi, gi, G 2 , gi, 
G t,P, e), with an additional independent generator T 6 G 2 . It sets the CRS 
P = (Gi, <7i,G2, <? 2 , T,Gx,P,e)-, 

— Com (a;), for x € Z p , chooses a random element r 4— 7L P and outputs the pair 
{C = g\T\b = g\y, 

— VerCom((7, x, 6) checks whether e(gi,C/T x ) = e(S, g-i)- 

This commitment scheme is clearly perfectly hiding, since the groups are cyclic, 
and for any C G G 2 , x £ Z p , there exists S £ Gi that satisfies e(g%,C/T x ) = 
e(S. < 72 )- More precisely, if C = g '2 and T = gtj, then S = g™~ tx opens C to any x. 
The binding property holds under the DDH assumption in G 2 , as proven in [231 
Section 4.1.4]. 

2.4 Equivocable Commitments 

An equivocable commitment scheme C extends on the previous definition, with 
SetupCom, Com, VerCom, and a second setup SetupComT(l' a ) that additionally 
outputs a trapdoor r, and 

— SimConr (t) that takes as input the trapdoor r and a label £ and outputs a 
pair (C, eqk), where C is a commitment and eqk an equivocation key; 

— OpenComqeqk, C, x) that takes as input a commitment C, a label L a mes- 
sage x, and an equivocation key eqk for this commitment, and outputs an 
opening data <5 for C and £ on x. 


SPHF-Friendly Non-interactive Commitments 221 


ExpJf- ind - 6 (£) 

(p, r) 4- SetupComT(l J? ) 

{l, x, state) 4- A SCom C,-) 
if b = 0 then (C, 5) 4- Cornea:) 
else (C,S) 4- SCom £ (r,x) 
return ■4 SCom (t, ) (state, C, 8) 


Exp5“(Ji) 

(p, r) 4- SetupComT(l^) 

(C, £,x,S) 4- _4 ExtCom ' Cr) (p) 
a/ t— ExtCom £ (r, G) 

if x' = x then return 0 
else return VerCorr/(C, x, S) 


Fig. 2. Simulation Indistinguishability and Binding Extractability 


Let us denote SCom the algorithm that takes as input the trapdoor r, a la- 
bel i and a message x and which outputs (C,S) 4- SCorrrfY, x), computed as 
(C, eqk) 4- SimCon/fr) and 6 «— OpenCorrr (eqk, C, x). Three additional prop- 
erties are then associated: a correctness property, and two indistinguishability 
properties, which all together imply the hiding property. 

— Trapdoor Correctness: all simulated commitments can be opened on any 
message: for all £, x, if (C, eqk) 4- SimCom^(r) and 6 <— OpenCorrr(eqk, C, x), 
then VerCoir4(C, x,d) = 1; 

— Setup Indistinguishability: one cannot distinguish the CRS p generated by 
SetupCom from the one generated by SetupComT. C is said (t, e)-setup- 
indistinguishable if the two distributions for p are (t , ^-computationally 
indistinguishable. We denote Adv^ etup ‘ ind (f) the distance between the two 
distributions. 

— Simulation Indistinguishability: one cannot distinguish a real commitment 
(generated by Com) from a fake commitment (generated by SCom), even with 
oracle access to fake commitments. C is said (t, ^-simulation-indistinguish- 
able if Advc im ~ ind (f) < e (see the experiments Exp^ lm ~ lnd ‘ b (jf) in Figure EJ. 

More precisely, when the trapdoor correctness is satisfied, since commitments 
generated by Sim Com are perfectly hiding (they can be opened in any way using 
OpenCom), Adv£ ld (i) < Adv^ etup ~ ind (f) + Adv^ lm ‘ lnd (t). 

Definition 1 (Equivocable Commitment). A commitment scheme C is said 
(t,e)-equivocable if, first, the basic commitment scheme satisfies the correctness 
property and is both ( t,e)-binding and ( t,e)-hiding , and, secondly, the addi- 
tional algorithms guarantee the trapdoor correctness and make it both (t, e)- setup- 
indistinguishable and ( t,e)- simulation-indistinguishable . 

2.5 Extractable Commitments 

An extractable commitment scheme C also extends on the initial definition, with 
SetupCom, Com, VerCom, as well as the second setup SetupComT(l- ft ) that ad- 
ditionally outputs a trapdoor r, and 

— ExtConr(T, C) which takes as input the trapdoor r, a commitment C, and 
a label l, and outputs the committed message x, or _L if the commitment is 
invalid. 
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As above, three additional properties are then associated: a correctness prop- 
erty, and the setup indistinguishability, but also an extractability property, which 
implies, together with the setup indistinguishability, the binding property: 

— Trapdoor Correctness : all commitments honestly generated can be correctly 
extracted: for all £,x, if ( C,6 ) 4- Corr/(a:) then ExtCorr/(C, r) = a:; 

— Setup Indistinguishability: as above; 

— Binding Extractability: one cannot fool the extractor, i.e., produce a com- 
mitment and a valid opening data to an input x while the commitment does 
not extract to x. C is said (t, #)«binding-extractable if Succ£ ind ~ ext (f) < e (see 
the experiment Exp^ lnd ~ ext ( Ji) in Figure [5]). 

More precisely, when one breaks the binding property with (C,£,xo,6o,xi,6i), 
if the extraction oracle outputs x' = xq, then one can output (C,£,xi,6i), 
otherwise one can output (C,£,xo,Sg). In both cases, this breaks the binding- 
extractability: Adve ind (t) < Adv^ etup ‘ lnd (t) + Succ blnd ~ ext (t). 

Definition 2 (Extractable Commitment). A commitment scheme C is said 
(t,e)- extractable if, first, the basic commitment scheme satisfies the correctness 
property and is both ( t,e)-binding and ( t,e)-hiding , and, secondly, the addi- 
tional algorithms guarantee the trapdoor correctness and make it both (t, e)-setup- 
indistinguishable and (t,e) -binding- extractable. 

3 Equivocable and Extractable Commitments 

3.1 E 2 -Commitments: Equivocable and Extractable 

Public-key encryption schemes are perfectly binding commitments that are addi- 
tionally extractable. The Pedersen and Haralambiev commitments are perfectly 
hiding commitments that are additionally equivocable. But none of them have 
the two properties at the same time. This is now our goal. 

Definition 3 (E 2 -Commitment). A commitment scheme C is said ( t,e)-E 2 
(equivocable and extractable) if the indistinguishable setup algorithm outputs a 
common trapdoor that allows both equivocability and extractability. If one denotes 
Advc ( t ) the maximum of Adv^ etvp ' ind (t), Adv^ im ~ zni (t), and Succ h (f ni ' ext {t), then 
it should be upper-bounded by e. 

But with such a common trapdoor, the adversary could exploit the equivocation 
queries to break extractability and extraction queries to break equivocability. 
Stronger notions can thus be defined, using the experiments Exp^ slm ~ lnd ~ b (.fi) 
and Exp^ bind ~ ext (£) in Figure |3J in which SCom is supposed to store each 
query/answer (£, x, C) in a list A and ExtCom-queries on such an SCom-output 
(£, C ) are answered by x (as it would be when using Com instead of SCom). 

— Strong Simulation Indistinguishability: one cannot distinguish a real com- 
mitment (generated by Com) from a fake commitment (generated by SCom), 
even with oracle access to the extraction oracle (ExtCom) and to fake com- 
mitments (using SCom). C is said (t, e)-strongly-simulation-indistinguishable 
if Adv^ slm ‘ lnd (i) < e; 
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Exp™-i" d - b (Ji) 

(p,r) 4- SetupComT(l ji ); 

(i,X, State) 4- ^SCom ( T ,.),ExtCo m ( r,.) (p) 
if 6 = 0 then ( C , 5) 4- Corr/(a;) 
else (C, 5) 4- SCorr/(T, x) 
return _4 SCom (•>-,•), &tCom (r,-) (state, C, S) 


Exp-bind-e.t^) 

(p,r) 4- SetupComT(l^) 

(C,l,X,S) 4- ^SCom (r,.),ExtCo m (r,-)( p) 

x' <— ExtCom*(r, C) 

if [L,x' , C) £ A then return 0 

if x' = x then return 0 

else return VerCorr/ (C, x. 5) 


Fig. 3. Strong Simulation Indistinguishability and Strong Binding Extractability 


— Strong Binding Extractability (informally introduced in [T3] as “simulation 
extractability”): one cannot fool the extractor, i.e., produce a commitment 
and a valid opening data (not given by SCom) to an input x while the 
commitment does not extract to x, even with oracle access to the extraction 
oracle (ExtCom) and to fake commitments (using SCom). C is said (t, e)- 
strongly-binding-extractable if Succ^~ blnd ~ ext (t) < e. 

They both imply the respective weaker notions since they just differ by giving 
access to the ExtCom-oracle in the former game, and to the SCom oracle in 
the latter. We insist that ExtCom-queries on SCom-outputs are answered by the 
related SCom-inputs. Otherwise, the former game would be void. In addition, 
VerCom always rejects inputs with x = _L, which is useful in the latter game. 

3.2 UC-Secure Commitments 

The security definition for commitment schemes in the UC framework was pre- 
sented by Canetti and Fischlin |10| . refined by Canetti [9]. The ideal functionality 
is presented in Figure01 where a public delayed output is an output first sent to the 
adversary S that eventually decides if and when the message is actually delivered to 
the recipient. In case of corruption of the committer, if this is before the Receipt- 
message for the receiver, the adversary chooses the committed value, otherwise it 
is provided by the ideal functionality, according to the Commit-message. Note this 
is actually the multiple-commitment functionality that allows multiple executions 
of the commitment protocol (multiple ssid’s) for the same functionality instance 
(one sid). This avoids the use of joint-state UC |14j . 

Theorem 4. A labeled E 2 - commitment scheme C, that is in addition strongly- 
simulation-indistinguishable or strongly-binding- extractable, is a non-interactive 
UC-secure commitment scheme in the presence of adaptive adversaries, assuming 
reliable erasures and authenticated channels. 

4 A Construction of Labeled E 2 -Commitment Scheme 

4.1 Labeled Cramer-Shoup Encryption on Vectors 

For our construction we use a variant of the Cramer-Shoup encryption scheme 
for vectors of messages. Let G be a cyclic group of order p, with two independent 
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The functionality T CO m is parametrized by a security parameter k. It interacts with 
an adversary S and a set of parties Pi,. . . ,P n via the following queries: 

Commit phase: Upon receiving a query (Commit, sid, ssid , Pi, Pj,x) from 
party Pi: record the tuple (sid, ssid, Pi, Pj,x ) and generate a public delayed output 
(Receipt, sid, ssid, Pi, Pj ) to Pj. Ignore further Commit-message with the same ssid 

Decommit phase. Upon receiving a query (Reveal, sid, ssid, Pi, Pj) from 
party Pi\ ignore the message if (sid, ssid, P,, Pj, x) is not recorded; otherwise 
mark the record (sid, ssid, Pi, Pj) as revealed and generate a public delayed out- 
put (Revealed, sid, ssid, Pi, Pj, x) to Pj. Ignore further Reveal-message with the 
same ssid from Pi. 


Fig. 4. Ideal Functionality for Commitment Scheme 


generators g and h. The secret decryption key is a random vector 
sk = {xi,X 2 ,yi,yi,z) 4- Z p and the public encryption key is pk = ( g,h,c = 
g xl h X2 ,d = g Vl h V2 ,f = g z ,H), where H is randomly chosen in a collision- 
resistant hash function family H (actually, second-preimage resistance is 
enough). For a message-vector M = £ G m , the multi-Cramer- 

Shoup encryption is defined as ra-MCSp k (M; (r;)i) = (CSp k (M, : , 0: n) = {u % = 
g ri , Vi =h ri ,ei = f ri -Mi,Wi = (cd e ) ri ))i, where 9 = H {(., (ui, Vi, e*)*) is the same 
for all the Wj ’ s to ensure non- malleability contrary to what we would have if we 
had just concatenated Cramer-Shoup ciphertexts of the Mj’s. Such a ciphertext 
C = (ui,Vj,ei,Wi)i is decrypted by M, = ei/uf, after having checked the valid- 
ity of the ciphertext, Wi = u Xl+ ° Vl v X2+ey2 ^ for « = rn. This multi-Cramer- 
Shoup encryption scheme, denoted MCS, is IND-CCA under the DDH assumption. 
It even verifies a stronger property VIND-PO-CCA (for Vector-Indistinguishability 
with Partial Opening under Chosen-Ciphertext Attacks), useful for the security 
proof of our commitment £ 2 C. 

4.2 Construction 

In this section, we provide a concrete construction £ 2 C, inspired from |l(lll.'jl2| . 
with the above multi-Cramer-Shoup encryption (as an extractable commitment 
scheme) and the TC4 Haralambiev’s equivocable commitment scheme [231 Sec- 
tion 4.1.4]. The latter will allow equivocability while the former will provide 
extractability: 

— SetupComT(l- s ) generates a pairing-friendly setting (Gi, < 71 , 62 , < 72 , Gr,P,e), 
with another independent generator hi of Gi. It then generates the param- 
eters of a Cramer-Shoup-based commitment in Gi: xq , X 2 , iji , 'y- 2 , z 4- Z p 
and H 4- V., and sets pk = {gi,h\,c = g^h* 2 ^ = g^tif 2 ,^ = gf,H). 
It then chooses a random scalar t 4- Z p , and sets T = g\. The CRS p 
is set as (pk, T) and the trapdoor r is the decryption key (xi,X 2 ,yi,y 2 ,z) 
(a.k.a. extraction trapdoor) together with t (a.k.a. equivocation trapdoor). 
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For SetupCom(l- ft ), the CRS is generated the same way, but forgetting the 
scalars, and thus without any trapdoor; 

— Com ( (M), for M = (Mj)j 6 {0, l} m and a label £, works as follows: 

• For i = 1, ... . to, it chooses a random scalar 4- sets ryi-M, = 0, 

and commits to Mj , using the TC4 commitment scheme with as ran- 

domness: a* = g 2 ’’' Mi T Mi , and sets dij = g^’ 3 for j = 0, 1, which makes 
di.Mi the opening value for a,; to M,; Let us also write a = (oi, . . . , a m ), 
the tuple of commitments. 

• For i = 1 ,... ,m and j = 0, 1, it gets b = ( bij)i t j = 2m-MCSp k (d; s), 
that is (ui.j , Vij , e.ij , Wij )ij , where d = (diy)i.j computed above, s = 

andf = (i,a). 

The commitment is C = ( a,b ), and the opening information is the m-tuple 
<5 = (si,Mi, • • • ,S m> M m )- 

— VerCon/((7, IVf , 5) checks the validity of the ciphertexts bi t M t with and 
6 computed on the full ciphertext C, extracts d n m, from b h M, and , and 


checks whether e(gi, a,i/T Mi ) = e(d* i M i , ^ 2 ), for * = 1, . . . , m. 

— SimCom*(r) takes as input the equivocation trapdoor, namely t, and outputs 
C = (a, b) and eqk = s, where 

• For i = 1, . . . , to, it chooses a random scalar r^o 4- Z p , sets r» ] 1 = r^o — t, 
and commits to both 0 and 1, using the TC4 commitment scheme with 
ri t 0 and r^i as respective randomness: a* = g 2 '° = ^ 2 '' and djj = 

for j = 0, 1, which makes dij the opening value for a,; to the value 
j € {0, 1}. This leads to a; 

• b is built as above: b = (bij)ij = 2m-MCSp k (d; s), with random scalars 


(* 


yu 


— OpenCorrr(eqk, C, M) simply extracts the useful values from eqk = s to 
make the opening value 6 = (si m- l ■ ■ ■ ■ ■ s m m ) hi order to open to M = 
(Mi)*. 

— ExtCom^(T, C) takes as input the extraction trapdoor, namely the Cramer- 
Shoup decryption key. Given 6, it can decrypt all the bij into dij and check 
whether e{gi,cti/T^) = e(ri,j, g?) or not. If, for each i, exactly one j = Mj 
satisfies the equality, then the extraction algorithm outputs (Mj)j, otherwise 
(no correct decryption or ambiguity with several possibilities) it outputs _L. 


4.3 Security Properties 

The above commitment scheme £ 2 C is a labeled E 2 -commitment, with both 
strong-simulation-indistinguishability and strong-binding-extractability, under 
the DDH assumptions in both Gi and G 2 . It is thus a UC-secure commitment 
scheme. The stronger VIND-P0-CCA security notion for the encryption scheme is 
required because the SCom/Com oracle does not only output the commitment 
(and thus the ciphertexts) but also the opening values which include the ran- 
dom coins of the encryption, but just for the plaintext components that are the 
same in the two vectors, since the two vectors only differ for unnecessary data 
(namely the dj,i-Mi’s) in the security proof. More details can be found in the 
full version [T| . 
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5 SPHF-Friendly Commitments 

5.1 Smooth Projective Hash Functions 

Projective hash function families were first introduced by Cramer and Shoup m, 
but we here use the definitions of Gennaro and Lindell |20| . provided to build 
secure password-based authenticated key exchange protocols, together with non- 
malleable commitments. 

Let X be the domain of these functions and let L be a certain subset of this 
domain (a language). A key property of these functions is that, for words C in 
L, their values can be computed by using either a secret hashing key hk or a 
public projection key hp but with a witness w of the fact that C is indeed in L: 

— HashKG(L) generates a hashing key hk for the language L; 

— ProjKG(hk, L, C) derives the projection key hp, possibly depending on the 
word C; 

— Hash(hk, L, C) outputs the hash value from the hashing key, on any word 
C&X- 

— ProjHash(hp, L, C, w) outputs the hash value from the projection key hp, and 
the witness w, for C e L. 

The correctness of the SPHF assures that if C G L with w a witness of this fact, 
then Hash(hk, L, C) = ProjHash(hp, L,C, w). On the other hand, the security is 
defined through the smoothness, which guarantees that, if C $ L, Hash(hk, L, C) 
is statistically indistinguishable from a random element, even knowing hp. 

Note that Hash KG and ProjKG can just depend partially on L (a superset L') 
and not at all on C: we then note HashKG(L') and ProjKG(hk, L' , _L) (see [5] for 
more details on GL-SPHF and KV-SPHF and language definitions). 

5.2 Robust Commitments 

For a long time, SPHFs have been used to implicitly check some statements, on 
language membership, such as “C indeed encrypts x”. This easily extends to per- 
fectly binding commitments with labels: L x = {(£, C) \ 35, VerCom e (C, x, S) = 1}. 
But when commitments are equivocable, this intuitively means that a commit- 
ment C with the label £ contains any x and is thus in all the languages L x . 
In order to be able to use SPHFs with E 2 -commitments, we want the commit- 
ments generated by polynomially-bounded adversaries to be perfectly binding, 
and thus to belong to at most one language L x . We thus need a robust verification 
property for such E 2 -commitments. 

Definition 5 (Robustness). One cannot produce a commitment and a label 
that extracts to x' (possibly x' = ±) such that there exists a valid opening data to 
a different input x, even with oracle access to the extraction oracle (FxtCom ) and 
to fake commitments (using SComJ. C is said ( t,e)-robust if Succ™ ust (t) < e, 
according to the experiment Exp^ o!> “ s< (ii) in Figure [3 

It is important to note that the latter experiment Exp^ bust ( it) may not be run 
in polynomial time. Robustness implies strong-binding-extractability. 
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Exp T ust (A) 

( p,r ) 4- SetupComT(l' s ) 

(C,£) 4- ^SCom' (t,-). ExtCom (r,-)( p ) 

x' 4— ExtCom £ (r, C) 

if (£, x',C) 6 A then return 0 

if 3x ^ x 1 , 3(5, VerCom^(C', x, 6) then return 1 

else return 0 


Fig. 5. Robustness 


5.3 Properties of SPHF-Friendly Commitments 

We are now ready to define SPHF-friendly commitments, which admit an SPHF 
on the languages L x = {(£, C) \ 3 <5, VerCom^C, x, <5) = 1}, and to discuss about 
them: 

Definition 6 (SPHF-Friendly Commitments). An SPHF-friendly commit- 
ment is an E 2 -commitment that admits an SPHF on the languages L x , and that 
is both strongly-simulation-indistinguishable and robust. 

Let us consider such a family T of SPHFs on languages L x for x £ X, with X a 
non trivial set (with at least two elements), with hash values in the set G. From 
the smoothness of the SPHF on L x , one can derive the two following properties 
on SPHF-friendly commitments, modeled by the experiments in Figure GO The 
first notion of smoothness deals with adversary-generated commitments, that are 
likely perfectly binding from the robustness, while the second notion of pseudo- 
randomness deals with simulated commitments, that are perfectly hiding. They 
are inspired by the security games from |20j . 

In both security games, note that when hk and hp do not depend on x nor 
on C, and when the smoothness holds even if the adversary can choose C after 
having seen hp ( i.e ., the SPHF is actually a KV-SPHF [5]), they can be generated 
from the beginning of the games, with hp given to the adversary much earlier. 

Smoothness of SPHF-Friendly Commitments. If the adversary A, with access to 
the oracles SCom and ExtCom, outputs a fresh commitment (£, C ) that extracts 
to x' <— ExtConr (r, C), then the robustness guarantees that for any x ^ x ' , 
(£, C) £ L x (excepted with small probability), and thus the distribution of the 
hash value is statistically indistinguishable from the random distribution, even 
when knowing hp. In the experiment Exp^ smooth (ji), we let the adversary choose 
x, and we have: Adv£’™ ooth (t) < Succ^ obust (t) + Adv™ ooth . 

Pseudo-Randomness of SPHF on Robust Commitments. If the adversary A is 
given a commitment C by SCom on x' with label £, both adversary-chosen, 
even with access to the oracles SCom and ExtCom, then for any x, it cannot 
distinguish the hash value of (£, C) on language L x from a random value, even 
being given hp, since C could have been generated as Cornea:") for some x" x, 
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ExpC-s.ooth -»(£) 

(p, r) 4- SetupComT(l' s ) 

{C,e,x, state) A ^scom ^.j.ExtCom tr,.)^). ^ <_ ExtCom^ (r, C) 

if (£, x', C)£A then return 0 

hk 4- HashKG(L x ); hp t- ProjKG(hk, L x , (£, C)) 

if 6 = 0 V x' = x then H «- Hash(hk, L x , (l, C)) else H 4- G 

return A SCom ' (r ' ) ’ ExtCom ' Cr ’' ) (state, hp, H) 


Exp7 B - rand - b (Jt) 

(p, r) 4- SetupComT(l' ft ) 

(£, a:, a;', state) ^scom (r,.),ExtCom (r,.) (p) . 4- SCorr/(T, x') 

hk A HashKG(La,); hp «- ProjKG(hk, L x , (£, C )) 
if b = 0 then H « - Hash(hk, L*, (f, C)) else H A G 
return yl SCom ( r - )- ExtCom ( A' ) (state, C. hp, tf) 


Fig. 6. Smoothness and Pseudo-Randomness 


which excludes it to belong to L x , under the robustness. In the experiment 
Exp^" ps ~ rand (jt), we let the adversary choose (£. x), and we have: Adv^"^“ rand (t) < 
Advc' sim ' ind (t) + Succc° bust (t) + Adv™ ooth . 

5.4 Our Commitment Scheme £ 2 C is SPHF-Friendly 

In order to be SPHF-friendly, the commitment first needs to be strongly- simula- 
tion-indistinguishable and robust. We have already shown the former property, 
and the latter is also proven in the full version |T] . One additionally needs an 
SPHF able to check the verification equation: using the notations from Sec- 
tion !4.21 C = (a, b) is a commitment of M = (Mj)j, if there exist 6 = (si,Mi ■■■■• 
and {di'Mn- ■ ■ such that = 

CSp k (di i Mi,^;Si,M i ) (with a particular 6) and e(gi,ai/T Mi ) = e(di t Mi,92), for 

i = 1, . . . , to. Since e is non-degenerated, we can eliminate the need of di t M t , by 
lifting everything in Gt, and checking that, first, the ciphertexts are all valid: 

e(ui, Mi , 92 ) = e(p*’ ,M< , g 2 ) ,92) = e(h \ * l ' Mi , g 2 ) 

e(w ijMi ,92 ) = e({cd e ) Si ' M * ,g 2 ) 

and, second, the plaintexts satisfy the appropriate relations: 

e(e iiMi ,92) = e(fi’ Mi ,g 2 ) ■ e{gi,ai/T Mi ). 

From these expressions we derive several constructions of such SPHFs in the 
full version [T], and focus here on the most interesting ones for the following 
applications: 
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— First, when C is sent in advance (known when generating hp), as in the 

OT protocol described in Section [71 for hk = (r],a, / 3 , fj,,e) A- Zjj, and hp = 
(e, h Pl = (cd e r) e Z p x G i: 

H = Hash(hk, Af, C) 

- Eli • (e(ei,M i ,g2)/e(gi,a i /T M *))P ■ e(w^ M .,g 2 )) 

= e(Ui hp^' Mi£ > 52) — ProjHash(hp, M, G, 6 ) = H' . 

— Then, when C is not necessarily known for computing hp, as in the one- 

round PAKE, described in Section 16] for hk = (%,i , rji,2-, ct. n Pi, Hi)i 4 - Z®™ , 
and hp = (hp ifl = g^’ x hf* /f 4 d** , hp i 2 = € Gf m : 

#= Hash(hk,M,< 7 ) 

- n* ( e ( w l!M- +0 " i,2) • <k>^) • (e{e iMi ,g2)/e(g 1 ,a i /T Mi )Y i ■ e(w^ M .,g 2 )) 
= e(ni(^Pi,ihPi,2) Si ' Mi ,52) — ProjHash(hp, M, C, 8) = H' . 

5.5 Complexity and Comparisons 

As summarized in Tabled] the communication complexity is linear in rri-K (where 
m is the bit-length of the committed value and it is the security parameter), 
which is much better than [5] that was linear inm-k 2 , but asymptotically worse 
than the two proposals in [19] that are linear in it, and thus independent of m 
(as long as rra = O(R)). 

Basically, the first scheme in [IS] consists of a Cramer-Shoup-like encryption 
C of the message x, and a perfectly-sound Groth-Sahai |2.'j| NIZK n that C 
contains x. The actual commitment is C and the opening value on x is 8 = n. 
The trapdoor-setup provides the Cramer-Shoup decryption key and changes the 
Groth-Sahai setup to the perfectly-hiding setting. The indistinguishable setups of 
the Groth-Sahai mixed commitments ensure the setup-indistinguishability. The 
extraction algorithm uses the Cramer-Shoup decryption algorithm, while the 
equivocation uses the simulator of the NIZK. The IND-CCA security notion for 
C and the computational soundness of n make it strongly-binding-extractable, 
the IND-CCA security notion and the zero-knowledge property of the NIZK pro- 
vide the strong-simulation-indistinguishability. It is thus UC-secure. However, 
the verification is not robust: because of the perfectly-hiding setting of Groth- 
Sahai proofs, for any ciphertext C and for any message x, there exists a proof 7r 
that makes the verification of C on x. As a consequence, it is not SPHF-friendly. 
The second construction is in the same vein: they cannot be used in the following 
applications. 

6 Password-Authenticated Key Exchange 

6.1 A Generic Construction 

The ideal functionality of a Password- Authenticated Key Exchange (PAKE) 
has been proposed in m- In Figure [7] we describe a one-round PAKE that 
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Table 1 . Comparison with existing non- interactive UC-secure commitments with a 
single global CRS (m = bit-length of the committed value, it = security parameter) 



SPHF-Friendly 

Commitment C 

Decommitment S 

Assumption 

[ w 

yes 

( m + 16mA.) X G 

2mA x Z p 

DDH 

nu, i 

no 

5 x G 

16 X G 

DLIN 

HU, 2 

no 

37 x G 

3 x G 

DLIN 

this paper 

yes 

8m x Gi + m x G2 

mx Zp 

SXDH 

a slight vari 

ant without one-tin: 

ie signature but using 1 e 

ibels for the IND-CCA 

security of the 

multi-Crai 

ner-Shoup ciphertex 

:ts, as in our new scheme 

, and supposing that ar 

l element in the 


cyclic group G has size 2A, to withstand generic attacks. 


is UC-secure against adaptive adversaries, assuming erasures. It can be built 
from any SPHF-friendly commitment scheme (that is E 2 , strongly-simulation- 
indistinguishable, and robust as described in Section [S), if the SPHF is actually 
a KV-SPHF [B] and the algorithms FlashKG and ProjKG do not need to know the 
committed value n (nor the word (£, C) itself) . We thus denote L v the language 
of the pairs (£, C), where C is a commitment that opens to n under the label £, 
and L the union of all the L n (L does not depend on 7r). 

Theorem 7. The Password- Authenticated Key-Exchange described on Figure 0 
is UC-secure in the presence of adaptive adversaries, assuming erasures, as soon 
as the commitment scheme is SPHF-friendly with a KV-SPHF. 


6.2 Concrete Instantiation 

Using our commitment £ 2 C introduced Section 0] together with the second SPHF 
described Section [5] (which satisfies the above requirements for Hash KG and 
ProjKG), one gets a quite efficient protocol, described in the full version [I]. 
More precisely, for m-bit passwords, each player has to send hp e G\ rn and 


CRS: p4- SetupCom(l' ft ). 

Protocol execution by Pi with 7r;: 

1. Pi generates hk* 4- HashKG(L), hp, ; <— ProjKG(hk i; L, _L) 
and erases any random coins used for the generation 

2. Pi computes ( C; , Si) A Con/ 4 {%£) with It = (sid, Pi,Pj, hp^ 

3. Pi stores Si, completely erases random coins used by Com 
and sends h p, , Ci to P, 

Key computation: Upon receiving hp j,Cj from Pj 

1. Pi computes Hi i- ProjHash(hp J , L ni , ( ii,Ci),5i ) 

and Hj <r- Hash(hk i,L ni ,{lj,Cf)) with lj = (sid, Pj, Pi, hp ? .) 

2. Pi computes sk, = Hi ■ Hj. 


Fig. 7. UC-Secure PAKE from an SPHF-Friendly Commitment 


SPHF-Friendly Non-interactive Commitments 231 


Table 2. Comparison with existing UC-secure PARE schemes 
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this paper 

yes 

yes 
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ariant of note 
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-time signatui 

•e (public key ! 

size and signature size) to link the flo 

ws in the PAKE 


C £ Gf m x G™, which means 10m elements from Gi and m elements from G 2 . 
In Tabled we compare our new scheme with some previous UC-secure PAKE. 

7 Oblivious Transfer 

7.1 A Generic Construction 

The ideal functionality of an Oblivious Transfer (OT) protocol is depicted in the 
full version pQ. It is inspired from jT7T . In Figure [8] we describe a 3-round OT 
that is UC-secure against adaptive adversaries, and a 2-round variant which is 
UC-secure against static adversaries. They can be built from any SPHF-friendly 
commitment scheme, where L t is the language of the commitments that open 
to t under the associated label £, and from any IND-CPA encryption scheme 

8 = (Setup, KeyGen, Encrypt, Decrypt) with plaintext size at least M, and from 
any Pseudo- Random Generator (PRG) F with input size equal to plaintext size, 
and output size equal to the size of the messages in the database. Details on 
encryption schemes and PRGs can be found in the full version [T] . Notice the 
adaptive version can be seen as a variant of the static version where the last flow 
is sent over a somewhat secure channel, as in H3; and the preflow and pk and c 
are used to create this somewhat secure channel. 

Theorem 8. The two Oblivous Transfer schemes described in Figured are UC- 
secure in the presence of adaptive adversaries and static adversaries respectively, 
assuming reliable erasures and authenticated channels, as soon as the commit- 
ment scheme is SPHF-friendly. 

7.2 Concrete Instantiation and Comparison 

Using our commitment 8 2 C introduced Section 0] together with the first SPHF 
described Section 0] one gets the protocol described in the full version pQ, where 
the number of bits of the commited value is m = [log k] . For the statically secure 
version, the communication cost is, in addition to the database m that is sent 
in M in a masked way, 1 element of Z p and k elements of Gi (for hp, by using 
the same scalar e for all hp t ’s) for the sender, while the receiver sends [log k] 
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CRS: p4- SetupCom(l' ft ),param 4- Setup(l' s ). 

Pre-flow (for adaptive security only): 

1. Pi generates a key pair (pk, sk) 4- KeyGen(param) for £ 

2. Pi stores sk, completely erase random coins used by KeyGen, and sends pk to P t 

Index query on s: 

1. Pj chooses a random value S, computes R •<— F(S ) and encrypts S under pk: 
c 4 - Encrypt(pk, S) (for adaptive security only; for static security: c =*£, R = 0) 

2. Pj computes (C, 8) 4 - Corr/(s) with i = (sid, ssid, P l , Pj) 

3. Pj stores 8 and completely erase R, S and random coins used by Com and Encrypt 
and sends C and c to Pi 

Database input (mi, . . . , mj): 

1. Pi decrypts S <— Decrypt(sk, c) and gets R <— F(S') (for static security: R = 0) 

2. Pi computes hk t 4- HashKG(Lt), hp t <— ProjKG(hk t , L t , (£,C)), 

K t <r- Hash(hk t , L t: (£, C)), and M t <— R(B At © mt, for t = 1, . . . , k 

3. Pi erases everything except (hp t , Mt)t=i,...,k and sends them over a secure channel 

Data recovery: 

Upon receiving (hp t , Mt)t=i,...,k, Pj computes K s «— ProjHash(hp s , L 3 , (£, C), S) 
and gets m s <— R © K s © M s . 


Fig. 8. UC-Secure 1-out-of-fe OT from an SPHF- Friendly Commitment (for Adaptive 
and Static Security) 

elements of G 2 (for a) and [ 8 log A;] elements of Gi (for b ) , in only two rounds. In 
the particular case of k = 2, the scalar can be avoided since the message consists 
of 1 bit, so our construction just requires: 2 elements from Gi for the sender, and 
1 from G 2 and 8 from Gi for the receiver, in two rounds. For the same security 
level (static corruptions in the CRS, with erasures), the best known solution 
from US] required to send at least 23 group elements and 7 scalars, in 4 rounds. 
If adaptive security is required, our construction requires 3 additional elements 
in Gi and 1 additional round, which gives a total of 13 elements in Gi, in 3 
rounds. This is also more efficient then the best known solution from [TS], which 
requires 26 group elements and 7 scalars, in 4 rounds. 
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Abstract. Revocation and key evolving paradigms are central issues in cryptog- 
raphy, and in PKI in particular. A novel concern related to these areas was raised 
in the recent work of Sahai, Seyalioglu, and Waters (Crypto 2012) who noticed 
that revoking past keys should at times (e.g., the scenario of cloud storage) be ac- 
companied by revocation of past ciphertexts (to prevent unread ciphertexts from 
being read by revoked users). They introduced revocable-storage attribute-based 
encryption (RS-ABE) as a good access control mechanism for cloud storage. RS- 
ABE protects against the revoked users not only the future data by supporting 
key-revocation but also the past data by supporting ciphertext-update, through 
which a ciphertext at time T can be updated to a new ciphertext at time T + 1 
using only the public key. Motivated by this pioneering work, we ask whether 
it is possible to have a modular approach, which includes a primitive for time 
managed ciphertext update as a primitive. We call encryption which supports 
this primitive a “self-updatable encryption” (SUE). We then suggest a modular 
cryptosystems design methodology based on three sub-components: a primary 
encryption scheme, a key-revocation mechanism, and a time-evolution mecha- 
nism which controls the ciphertext self-updating via an SUE method, coordinated 
with the revocation (when needed). Our goal in this is to allow the self-updating 
ciphertext component to take part in the design of new and improved cryptosys- 
tems and protocols in a flexible fashion. Specifically, we achieve the following 
results: 

- We first introduce a new cryptographic primitive called self-updatable en- 
cryption (SUE), realizing a time-evolution mechanism. We also construct an 
SUE scheme and prove its full security under static assumptions. 

- Following our modular approach, we present a new RS-ABE scheme with 
shorter ciphertexts than that of Sahai et al. and prove its security. The length 
efficiency is mainly due to our SUE scheme and the underlying modularity. 

- We apply our approach to predicate encryption (PE) supporting attribute- 
hiding property, and obtain a revocable-storage PE (RS-PE) scheme that is 
selectively-secure. 

- We further demonstrate that SUE is of independent interest, by showing it 
can be used for timed-release encryption (and its applications), and for aug- 
menting key-insulated encryption with forward-secure storage. 

Keywords: Public-key encryption. Attribute-based encryption. Predicate encryp- 
tion, Self-updatable encryption. Revocation, Key evolving systems. Cloud storage. 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 235-E54] 2013. 
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1 Introduction 

Cloud data storage has many advantages: A virtually unlimited amount of space can be 
flexibly allocated with very low costs, and storage management, including back-up and 
recovery, has never been easier. More importantly, it provides great accessibility: users 
in any geographic location can access their data through the Internet. However, when 
an organization is to store privacy-sensitive data , existing cloud services do not seem 
to provide a good security guarantee yet (since the area is in its infancy). In particular, 
access control is one of the greatest concerns, that is, the sensitive data items have to be 
protected from any illegal access, whether it comes from outsiders or even from insiders 
without proper access rights. 

One possible approach for this problem is to use attribute-based encryption (ABE) 
that provides cryptographically enhanced access control functionality in encrypted data 
QHQjilGO). In ABE, each user in the system is issued a private key from an authority 
that reflects their attributes (or credentials), and each ciphertext specifies access to itself 
as a boolean formula over a set of attributes. A user will be able to decrypt a ciphertext 
if the attributes associated with their private key satisfy the boolean formula associated 
with the ciphertext. To deal with the change of user’s credentials that takes place over 
time, revocable ABE (R-ABE) O has been suggested, in which a user’s private key can 
be revoked. In R-ABE, a key generation authority uses broadcast encryption to allow 
legitimate users to update their keys. Therefore, a revoked user cannot learn any partial 
information about the messages encrypted when the ciphertext is created after the time 
of revocation (or after the user’s credential has expired). 

As pointed out by Sahai, Seyalioglu, and Waters 1H,R -ABE alone does not suffice 
in managing dynamic credentials for cloud storage. In fact, R-ABE cannot prevent a 
revoked user from accessing ciphertexts that were created before the revocation, since 
the old private key of the revoked user is enough to decrypt these ciphertexts. To over- 
come this, they introduced a novel revocable-storage ABE (RS-ABE) which solves this 
issue by supporting not only the revocation functionality but also the ciphertext update 
functionality such that a ciphertext at any arbitrary time T can be updated to a new ci- 
phertext at time T + 1 by any party just using the public key (in particular, by the cloud 
servers). 

Key-revocation and key evolution are general sub-area in cryptosystems design, and 
ciphertext-update is a new concern which may be useful elsewhere. So, in this paper, 
we ask natural questions: 

Can we achieve key-revocation and ciphertext-update in other encryption schemes? 

Can we use ciphertext-update as an underlying primitive by itself? 

We note that, in contrast to our questions, the methodology that Sahai et al. If29ll used 
to achieve ciphertext-update is customized to the context of ABE. In particular, they 
first added ciphertext-delegation to ABE, and then, they represented time as a set of 
attributes, and by doing so they reduced ciphertext-update to ciphertext-delegation. 
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1.1 Our Results 

We address the questions by taking a modular approach, that is, by actually constructing 
a cryptographic component realizing each of the two functionalities: key revocation and 
ciphertext update. In particular, our design approach is as follows: 

- The overall system has three components: a primary encryption scheme (i.e., ABE or 
some other encryption scheme), a key-revocation mechanism, and a time-evolution 
mechanism. 

- We combine the components by putting the key-revocation mechanism in the center 
and connecting it with the other two. This is because the revoked users need to be 
taken into account both in the decryption of the primary scheme and in the time- 
evolution of ciphertexts. 

There are a few potential benefits to this approach. First, we may be able to achieve key- 
revocation and time-evolution mechanisms, independently of the primary encryption 
scheme. Secondly, each mechanism may be of independent interest and be used in other 
interesting scenarios. Thirdly, looking at each mechanism alone may open the door to 
various optimizations and flexibilities of implementations. 

Time-Evolution Mechanism: Self-Updatable Encryption. We first formulate a new 
cryptographic primitive called self-updatable encryption (SUE), realizing a time- 
evolution mechanism. In SUE, a ciphertext and a private key are associated with time 
T c and 7). respectively. A user who has a private key with time 7) can decrypt the ci- 
phertext with time T c if T c < 7). Additionally, anyone can update the ciphertext with 
time T c to a new ciphertext with new time 7)' such that T c < Tf We construct an SUE 
scheme in composite order bilinear groups. In our SUE scheme, a ciphertext consists 
of 0(\ogT max ) group elements, and a private key consists of 0(\ogT max ) group ele- 
ments, where T max is the maximum time period in the system. Our SUE scheme is 
fully secure under static assumptions by using the dual system encryption technique of 
Waters mm. 

RS-ABE with Shorter Ciphertexts. Following the general approach above, we con- 
struct a new RS-ABE scheme and prove that it is fully secure under static assumptions. 
In particular, we take the ciphertext-policy ABE (CP- ABE) scheme of Lewko et al. ma 
as the primary encryption scheme, and combine it with our SUE scheme and a revoca- 
tion mechanism. The revocation mechanism follows the design principle of Boldyreva, 
Goyal, and Kumar (3) that uses the complete subtree method to securely update the keys 
of the non-revoked users. Compared with the scheme of Sahai et al. m, our scheme 
has a shorter ciphertext length consisting of 0(1 + logT max ) groups elements where l 
is the size of row in the ABE access structure; a ciphertext in their scheme consists of 
0(1 log Tmax -f log 2 T max ) group elements (reflecting the fact that time is dealt with in a 
less modular fashion there, while we employ the more separated SUE component which 
is length efficient). 

Revocable-Storage Predicate Encryption. We apply our approach to predicate en- 
cryption (PE) and give the first RS-PE scheme. In particular, taking the PE scheme of 
Park ll26ll as the primary encryption scheme, we combine it with the same revocation 
functionality and (a variant of) our SUE scheme. The scheme is in prime-order groups 
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and is shown to be selectively secure (a previously used weaker notion than (full) secu- 
rity, where the adversary selects the target of attack at the start). Obviously, compared 
with the RS-ABE scheme, the RS-PE scheme is a PE system and, thus, additionally sup- 
ports the attribute-hiding property: even a decryptor cannot obtain information about the 
attributes x of a ciphertext except f(x), where / is the predicate of its private key. 
Other Systems. These are discussed below in this section. 

1.2 Our Technique 

To devise our SUE scheme, we use a full binary tree structure to represent time. The 
idea of using the full binary tree for time was already used by Canetti et al. (8) to 
construct a forward-secure public-key encryption (FSE) scheme. However, our scheme 
greatly differs on a technical level from their approach; in our scheme, a ciphertext is 
updated from time 7 ) to time Tj > Tj, whereas in their scheme a private key is updated 
from time 7 ] to time Tj > 7 ). We start from the HIBE scheme of Boneh and Boyen a, 
and then construct a ciphertext delegatable encryption (CDE) scheme, by switching 
the structure of private keys with that of ciphertexts; our goal is to support ciphertext 
delegation instead of private key delegation. In CDE, each ciphertext is associated with 
a tree node, so is each private key. A ciphertext at a tree node v c can be decrypted by 
any keys with a tree node where v* is a descendant (or self) of v c . We note that 
the CDE scheme may be of independent interest. The ciphertext delegation property of 
CDE allows us to construct an SUE scheme. An SUE ciphertext at time 7 ] consists of 
multiple CDE ciphertexts in order to support ciphertext-update for every Tj such that 
Tj > Ti. We were able to reduce the number of group elements in the SUE ciphertext by 
carefully reusing the randomness of CDE ciphertexts. 

Our key-revocation mechanism, as mentioned above, uses a symmetric-key broad- 
cast encryption scheme to periodically broadcast update keys to non-revoked users. 
A set of non-revoked users is represented as a node (more exactly the leaves of the 
subtree rooted at the node) in a tree, following the complete subset (CS) scheme of 
Naor et al. l22l . So, we use two different trees in this paper, i.e., one for representing 
time in the ciphertext domain, and the other for managing non-revoked users in the key 
domain. 

In the RS-ABE/RS-PE setting, a user u who has a private key with attributes x and an 
update key with a revoked set R at time T' can decrypt a ciphertext with a policy / and 
time T if the attribute satisfies the policy (/(x) = 1) and the user is not revoked (u f R), 
and T <T' . The main challenge in combining all the components is protecting the 
overall scheme against a collusion attack, e.g., a non-revoked user with a few attributes 
should not decrypt more ciphertexts than he is allowed to, given the help of a revoked 
user with many attributes. To achieve this, we use a secret sharing scheme as suggested 
in 0. Roughly speaking, the overall scheme is associated with a secret key a. For each 
node Vj in the revocation tree, this secret key a is split into y for ABE/PE, and a — y for 
SUE, where y is random. Initially, each user will have some tree nodes v ( -s according 
to the revocation mechanism, and get ABE/PE private keys subject to his attributes at 
each of v,s (associated with the ABE/PE master secret yj. In key-update at time T, only 
non-revoked users receive SUE private keys with time T at a tree node Vj representing 
a set of non-revoked users (associated with the SUE master secret a — yf). 
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1.3 Other Applications 

Timed-Release Encryption. One application of SUE is timed-release encryption (TRE) 
and its variants J27][28). TRE is a specific type of PKE such that a ciphertext specified 
with time T can only be decrypted after time T. In TRE, a semi-trusted time server pe- 
riodically broadcasts a time instant key (TIK) with time T' to all users. A sender creates 
a ciphertext by specifying time T, and a receiver can decrypt the ciphertext if he has a 
TIK with time T' such that T >T . TRE can be used for electronic auctions, key escrow, 
on-line gaming, and press releases. TRE and its variants can be realizable by using IBE, 
certificateless encryption, or forward-secure PKE (FSE) 031271. An SUE scheme can 
be used for a TRE scheme with augmented properties, since a ciphertext with time T 
can be decrypted by a private key with time T' >T from using the ciphertext update 
functionality, and, in addition, we have flexibility of having a public ciphertext server 
which can tune the ciphertext time forward before final public release. In this scheme, 
a ciphertext consists of 0{\ogT max ) and a TIK consists of 0(log T max ). TRE, in turn, 
can help in designing synchronized protocols, like fair exchanges in some mediated but 
protocol-oblivious server model. 

Key-Insulated Encryption with Ciphertext Forward Security. SUE can be used to 
enhance the security of key-insulated encryption (KIE) lfl2l . KIE is a type of PKE that 
additionally provides tolerance against key exposures. For a component of KIE, a mas- 
ter secret key MK is stored on a physically secure device, and a temporal key SKy for 
time T is stored on an insecure device. At a time period T, a sender encrypts a mes- 
sage with the time T and the public key PK, and then a receiver who obtains SKy by 
interacting with the physically secure device can decrypt the ciphertext. KIE provides 
the security of all time periods except those in which the compromise of temporal keys 
occurred. KIE can be obtained from IBE. Though KIE provides strong level of secu- 
rity, it does not provide security of ciphertexts available in compromised time periods, 
even if these ciphertexts are to be read in a future time period. To enhance the security 
and prevent this premature disclosure, we can build a KIE scheme with forward- secure 
storage by combining KIE and SUE schemes. Having cryptosystems with key-insulated 
key and forward-secure storage is different from intrusion-resilient cryptosystems ED. 

1.4 Related Work 

Attribute-Based Encryption. As mentioned, ABE extends IBE, such that a ciphertext 
is associated with an attribute x and a private key is associated with an access structure 
/. When a user has a private key with /, only then he can decrypt a ciphertext with x 
that satisfies f(x) = 1 . Sahai and Waters 121 introduced fuzzy IBE (F-IBE) that is a 
special type of ABE. Goyal et al. IH proposed a key-policy ABE (KP-ABE) scheme 
that supports flexible access structures in private keys. Bethencourt et al. !2) proposed 
a ciphertext-policy ABE (CP- ABE) scheme such that a ciphertext is associated with 
an access structure / and a private key is associated with an attribute x. After that, 
numerous ABE schemes with various properties were proposed 13 fl8ll20ll25ll32ll . 
Predicate Encryption. PE is also an extension of IBE that additionally provides an 
attribute-hiding property in ciphertexts: A ciphertext is associated with an attribute x 
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and a private key is associated with a predicate /. Boneh and Waters 0 introduced 
the concept of PE and proposed a hidden vector encryption (HVE) scheme that sup- 
ports conjunctive queries on encrypted data. Katz et al. HU proposed a PE scheme 
that supports inner-product queries on encrypted data. After that, many PE schemes 
with different properties were proposed (l2l[23]|24l[26). Boneh, Sahai, and Waters ||6ll 
formalized the concept of functional encryption (EE) by generalizing ABE and PE. 
Revocation. Boneh and Franklin 0 proposed a revocation method for IBE that period- 
ically re-issues the private key of users. That is, the identity ID of a user contains time 
information, and a user cannot obtain a valid private key for new time from a key gen- 
eration center if he is revoked. However, this method requires for all users to establish 
secure channels to the server and prove their identities every time. To solve this prob- 
lem, Boldyreva et al. 0 proposed an R-IBE scheme by combining an F-IBE scheme 
and a full binary tree structure. Libert and Vergnaud m proposed a fully secure R-IBE 
scheme. 

2 Preliminaries 

2.1 Notation 

We let A be a security parameter. Let [n] denote the set { 1 ... . .«} for n G N. For a string 
L e {0, 1}", let L[i] be the ith bit of L, and L|, be the prefix of L with i-bit length. For 
example, if L = 010, then L[l] = 0,L[2] = 1,L[3] = 0, andL|i = 0,L| 2 = 01,L| 3 = 010. 
Concatenation of two strings L and L' is denoted by L\ \ll . 


2.2 Full Binary Tree 

A full binary tree BT is a tree data structure where each node except the leaf nodes has 
two child nodes. Let N be the number of leaf nodes in BT. The number of all nodes 
in BT is 2 N — 1. For any index 0 < i < 2N — 1, we denote by v, a node in BT. We 
assign the index 0 to the root node and assign other indices to other nodes by using 
breadth-first search. The depth of a node v,- is the length of the path from the root node 
to the node. The root node is at depth zero. Siblings are nodes that share the same parent 
node. 

For any node v, G BT, L is defined as a label that is a fixed and unique string. The 
label of each node in the tree is assigned as follows: Each edge in the tree is assigned 
with 0 or 1 depending on whether the edge is connected to its left or right child node. 
The label L of a node v, is defined as the bitstring obtained by reading all the labels of 
edges in the path from the root node to the node v, . Note that we assign a special empty 
string to the root node as a label. 

2.3 Subset Cover Framework 

The subset cover (SC) framework introduced by Naor, Naor, and Lotspiech 1221 is 
a general methodology for the construction of efficient revocation systems. The SC 
framework consists of the subset-assigning part and key-assigning part for the subset. 


Self-Updatable Encryption 241 


We define the SC scheme by including only the subset-assigning part. The formal defi- 
nition of SC is given in the full version of this paper m. 

We use the complete subset (CS) scheme proposed by Naor et al. El as a building 
block for our schemes. The CS scheme uses a full binary tree BT to define the subsets 
Si. For any node v, G BT , 71 is defined as a subtree that is rooted at v, and .S', is defined 
as the set of leaf nodes in 7J. For the tree BT and a subset R of leaf nodes, ST ( BT , R) is 
defined as the Steiner Tree induced by the set R and the root node, that is, the minimal 
subtree of BT that connects all the leaf nodes in R and the root node, we simply denote 
ST(BT,R) by ST(R). The CS scheme is described as follows: 

CS.Setup(/V„„ u ): This algorithm takes as input the maximum number of users N niax . 
Let Nmax = 2 d for simplicity. It first sets a full binary tree BT of depth d. Each user 
is assigned to a different leaf node in BT. The collection S of CS is {Si : v,- 6 BT}. 
Recall that .S', is the set of all the leaves in the subtree 77. It outputs the full binary 
tree BT. 

CS.Assign (BT,u): This algorithm takes as input the tree BT and a user u G AT. Let 
v u be the leaf node of BT that is assigned to the user u. Let (vj 0 , v n .... ,vj d ) 
be the path from the root node v /() = vo to the leaf node vj n = v„. It sets PV U = 
{Sj 0 , . . . , Sj d }, and outputs the private set PV U . 

CS.Cover (BT,R): This algorithm takes as input the tree BT and a revoked set R of 
users. It first computes the Steiner tree ST(R). Let 77, ... . 77, „ be all the subtrees 
of BT that hang off ST(R), that is all subtrees whose roots v,-, , . . . v, m are not in 
ST(R) but adjacent to nodes of outdegree 1 in ST(R). It outputs a covering set 
CVr = {S , il ,...,5j m }. 

CS.Match(CVR , PV U ) : This algorithm takes input as a covering set CVr = {.?,, , . . . , S,- m } 
and a private set PV U = {Sj 0 , . ..,Sj d }. It finds a subset 5* such that G CVr and 
Sk G PV U . If there is such a subset, it outputs ( Sk,Sk ). Otherwise, it outputs _L 

Lemma 1 ( EH). Let Nmax be the number of leaf nodes in a full binary tree and r be 
the size of a revoked set. In the CS scheme, the size of a private set is 0(\ogN max ) and 
the size of a covering set is at most r\og(N„ wx / r). 

3 Self-Updatable Encryption 

3.1 Definitions 

Ciphertext Delegatable Encryption ( CDE). Before introducing self-updatable encryp- 
tion, we first introduce ciphertext delegatable encryption. Ciphertext delegatable en- 
cryption (CDE) is a special type of public-key encryption (PKE) with the ciphertext 
delegation property such that a ciphertext can be easily converted to a new ciphertext 
under a more restricted label string by using public values. The following is the syntax 
of CDE. 

Definition 1 (Ciphertext Delegatable Encryption). A ciphertext delegatable encryp- 
tion ( CDE) scheme for the set C of labels consists of seven PPT algorithms Itiit, Setup, 
GenKey, Encrypt, DelegateCT, RandCT, and Decrypt, which are defined as follows: 
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Init( 1 K ). The initialization algorithm takes as input a security parameter l\ and it 
outputs a group description string GDS. 

Setup( GDS, d max ). The setup algorithm takes as input a group description string GDS 
and the maximum length d max of the label strings, and it outputs public parameters 
PP and a master secret key MK. 

GenKey(L,MK,PP). The key generation algorithm takes as input a label string L G 
{0, 1 } k with k < dmax, the master secret key MK, and the public parameters PP, 
and it outputs a private key SKy 

Encrypt(L,s,s,PP). The encryption algorithm takes as input a label string L G {0, \} d 
with d < dmax, a random exponent s, an exponent vector s, and the public parame- 
ters PP, and it outputs a ciphertext header CHi and a session key EK. 

DelegateCT(CH[ ,c, PP). The ciphertext delegation algorithm takes as input a cipher- 
text header CH L for a label string L G {0, 1 } d with d < dmax, a bit value c G {0, 1}, 
and the public parameters PP, and it outputs a delegated ciphertext header CHy 
for the label string L' = L\\c. 

RandCT(CH[ , s' , s. PP). The ciphertext randomization algorithm takes as input a ci- 
phertext header CHj for a label string L G {0, 1 }“ with d < d max , a new random 
exponent s' , a new vector s, and the public parameters PP, and it outputs a re- 
randomized ciphertext header CH' L and a partial session key EK' . 

Decrypt) CHj , SKy , PP). The decryption algorithm takes as input a ciphertext header 
CHy a private key SKy, and the public parameters PP, and it outputs a session key 
EK or the distinguished symbol Jli 

The correctness property of CDE is defined as follows: For all PP,MK generated by 
Setup, all L,l!, any SKy generated by GenKey, any CHl and EK generated by Encrypt 
or DelegateCT, it is required that: 

- IfL is a prefix ofL', then Decrypt(CH L ,SKy ,PP) = EK. 

- If L is not a prefix of L! , then Decrypt(CH\ ,SKy , PP) =_L with all but negligible 
probability. 

Additionally, it requires that the ciphertext distribution ofRandCT is statistically equal 
to that of Encrypt. 

Remark 1. The syntax of CDE is different with the usual syntax of encryption since 
the encryption algorithm additionally takes input random values instead of selecting its 
own randomness. Because of this difference, we cannot show the security of SUE under 
the security of CDE, but this syntax difference is essential for the ciphertext efficiency 
of SUE. 

Self-Updatable Encryption (SUE). Self-updatable encryption (SUE) is a new type of 
PKE with the ciphertext updating property such that a time is associated with private 
keys and ciphertexts and a ciphertext with a time can be easily updatable to a new 
ciphertext with a future time. In SUE, the private key of a user is associated with a time 
T' and a ciphertext is also associated with a time T . If T < T', then a user who has a 
private key with a time T' can decrypt a ciphertext with a time T . That is, a user who has 
a private key for a time T' can decrypt any ciphertexts attached a past time T such that 
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T < T', but he cannot decrypt a ciphertext attached a future time T such that T < T . 
Additionally, the SUE scheme has the ciphertext update algorithm that updates the time 
T of a ciphertext to a new time T + 1 by using public parameters. The following is the 
syntax of SUE. 

Definition 2 (Self-Updatable Encryption). A self-updatable encryption (SUE) scheme 
consists of seven PPT algorithms Init, Setup, GenKey, Encrypt, UpdateCT, RandCT, 
and Decrypt, which are defined as follows: 

Init(l^). The initialization algorithm takes as input a security parameter l\ and it 
outputs a group description string GDS. 

Setup ( GDS, T max ). The setup algorithm takes as input a group description string GDS 
and the maximum time T max , and it outputs public parameters PP and a master 
secret key MK. 

GenKey (T,MK,PP). The key generation algorithm takes as input a time T, the master 
secret key MK, and the public parameters PP, and it outputs a private key SKj. 

Encrypt(T,s,PP). The encryption algorithm takes as input a time T, a random value 
s, and the public parameters PP, and it outputs a ciphertext header CHj and a 
session key EK. 

UpdateCTfCHj , T + 1 , PP). The ciphertext update algorithm takes as input a cipher- 
text header CHj for a time T, a next time T + 1, and the public parameters PP, and 
it outputs an updated ciphertext header CH r+ \. 

RandCTfCHi , s' , PP). The ciphertext randomization algorithm takes as input a ci- 
phertext header CHj for a time T, a new random exponent s', and the public pa- 
rameters PP, and it outputs an re-randomized ciphertext header CH' T and a partial 
session key EK' . 

Decrypt(CHT,SK T /,PP). The decryption algorithm takes as input a ciphertext header 
CHj, a private key SK t i, and the public parameters PP, and it outputs a session 
key EK or the distinguished symbol _L. 

The correctness property of SUE is defined as follows: For all PP,MK generated by 
Setup, all T, T', any SK t i generated by GenKey, and any CHj and EK generated by 
Encrypt or UpdateCT, it is required that: 

- If T < T', then Decrypt(CH T , SK t i , PP) =EK. 

- IfT >T', then Decrypt(CH T ,SK t i ,PP) =_L with all but negligible probability. 

Additionally, it requires that the ciphertext distribution of RandCT is statistically equal 
to that of Encrypt. 

Remark 2. For the definition of SUE, we follow the syntax of key encapsulation mech- 
anisms instead of following that of standard encryption schemes since the session key 
of SUE serves as the partial share of a real session key in other schemes. 

Definition 3 (Security). The security property for SUE schemes is defined in terms of 
the indistinguishability under a chosen plaintext attack (1ND-CPA). The security game 
for this property is defined as the following game between a challenger C and a PPT 
adversary A: 
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1. Setup: C runs Init and Setup to generate the public parameters PP and the master 
secret key MK, and it gives PP to A 

2. Query 1: A may adaptively request a polynomial number of private keys for times 
7) .... . T q i, and C gives the corresponding private keys SKj \ , . . . , .S' AY, to A by run- 
ning GenKey(Ti,MK,PP). 

3. Challenge: A outputs a challenge time T* subject to the following restriction: For 
all times { 7] } of private key queries, it is required that Ti < T*. C chooses a random 
bit b £ {0, 1} and computes a ciphertext header CH* and a session key EK* by 
running Encrypt(T * , s, PP). Ifb = 0, then it gives CH* and EK* to A. Otherwise, 
it gives CH* and a random session key to A. 

4. Query 2: A may continue to request private keys for additional times T q '+\ , . . . , T q 
subject to the same restriction as before, and C gives the corresponding private keys 
to A 

5. Guess: Finally A outputs a bit b'. 

The advantage of A is defined as Adv s Jf E (A) = | Pr[b = b'] — \ \ where the probability 
is taken over all the randomness of the game. A SUE scheme is fully secure under a 
chosen plaintext attack if for all PPT adversaries A the advantage of A in the above 
game is negligible in the security parameter X. 

Remark 3. In the above security game, it is not needed to explicitly describe Upda- 
teCT since the adversary can run UpdateCT to the challenge ciphertext header by just 
using PP. Note that the use of UpdateCT does not violate the security game since the 
adversary only can request a private key query for 7] such that 7] <T*. 

3.2 Bilinear Groups of Composite Order 

Let N = P1P2P3 where p\ ,po, and pj are distinct prime numbers. Let G and Gy- be two 
multiplicative cyclic groups of same composite order n and g be a generator of G. The 
bilinear map e : G x G — > Gy has the following properties: 

1. Bilinearity: Vm,v 6 G and fa,b £ Z„, e(u a ,v b ) = e{u,v) ab . 

2. Non-degeneracy: 3 g such that e(g,g) has order N, that is, e(g,g) is a generator of 
G j. 

We say that G is a bilinear group if the group operations in G and Gy as well as the bi- 
linear map e are all efficiently computable. Furthermore, we assume that the description 
of G and Gy includes generators of G and Gy respectively. We use the notation G Pi to 
denote the subgroups of order pt of G respectively. Similarly, we use the notation Gy iPi 
to denote the subgroups of order pi of Gy respectively. 

3.3 Complexity Assumptions 

We give three static assumptions in bilinear groups of composite order that were intro- 
duced by Lewko and Waters Ifl9ll . The Assumption 1 (Subgroup Decision), the Assump- 
tion 2 (General Subgroup Decision), and the Assumption 3 (Composite Diffie-Hellman) 
are described in the the full version of this paper m. 
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3.4 Design Principle 

We use a full binary tree to represent time in our SUE scheme by assigning time periods 
to all tree nodes instead of assigning time periods to leaf nodes only. The use of binary 
trees to construct key-evolving schemes dates back to the work of Bellare and Miner Q), 
and the idea of using all tree nodes to represent time periods was introduced by Canetti, 
Halevi, and Katz JS). They used a full binary tree for private key update in forward- 
secure PKE schemes, but we use the full binary tree for ciphertext update. 

In the full binary tree BT, each node v (internal node or leaf node) is assigned a 
unique time value by using the pre-order tree traversal that recursively visits the root 
node, the left subtree, and the right subtree. Note that we use breadth-first search for 
index assignment, but we use pre-order traversal for time assignment. Let Path(v) 
be the set of path nodes from the root node to a node v, RightSibling(Path(v) jQ be 
the set of right sibling nodes of Path(v), and TimeNodes(v) be the set of nodes that 
consists of v and RightSibling(Path(v)) excluding the parent’s path nodes. That is, 
TimeNodes(v) = {v} URightSibling(Path(v)) \Path(Parent(v)). Pre-order traversal 
has the property such that if a node v is associated with time T and a node v' is associ- 
ated with time T' , then we have 

TimeNodes(v) fl Path(v') ^ 0 if and only if T < T' . 

Thus if a ciphertext has the delegation property such that it’s association can be changed 
from a node to its child node, then a ciphertext for the time T can be easily delegated 
to a ciphertext for the time T' such that T < T' by providing the ciphertexts of its own 
and right sibling nodes of path nodes excluding path nodes. 

For the construction of an SUE scheme that uses a full binary tree, we need a CDE 
scheme that has the ciphertext delegation property in the tree such that a ciphertext 
associated with a node can be converted to another ciphertext associated with its child 
node. Hierarchical identity-based encryption (HIBE) has the similar delegation property 
in the tree, but the private keys of HIBE can be delegated Rirnn. To construct a CDE 
scheme that supports the ciphertext delegation property, we start from the HIBE scheme 
of Boneh and Boyen (4) and interchange the private key structure with the ciphertext 
structure of their HIBE scheme. To use the structure of HIBE, we associate each node 
with a unique label string L £ {0, 1}*. The ciphertext delegation property in CDE is 
easily obtained from the private-key delegation property of HIBE. 

To build an SUE scheme from the CDE scheme, we define a mapping function \\r that 
maps time T to a label L in the tree nodes since these two scheme uses the same full 
binary tree. The SUE ciphertext for time T consists of all CDE ciphertexts for all nodes 
in TimeNodes(v) where time T is associated with a node v. Although the ciphertext of 
SUE just consists of 0(log T max ) number of CDE ciphertexts, the ciphertext of SUE can 
be 0(log 1 2 Tmax) group elements since the ciphertext of a naive CDE scheme from the 
HIBE scheme has C^logT^) number of group elements. To improve the efficiency of 
the ciphertext size, we use the randomness reuse technique for CDE ciphertexts. In this 
case, we obtain an SUE scheme with 0{\ogT niax ) group elements in ciphertexts. 

1 Note that we have RightSibling(Path(v)) = RightChild(Path(Parent(v))) where 
RightChiId(Path(v)) be the set of right child nodes of Pathfvj and Parent(v) be the 

parent node of v. 
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3.5 Construction 

CDE.Initt 1 This algorithm takes as input a security parameter 1^. It generates a 
bilinear group G of composite order N = P1P2P3 where pi,p 2 , and p$ are random 
primes. It chooses a random generator g 1 G G pi and outputs a group description 
string as GDS = (( N,G,Gr,e),gi,pi,p 2 ,P3 ). 

CDE.Setup(GD5', Jtojj;): This algorithm takes as input the string GDS and the maxi- 
mum length dmax of the label strings. Let l = d max . It chooses random elements w, 
{ M i,0) M (,i}; = i){A,oA',i}j = i € Gpj, a random exponent J3 G Z/v, and a random ele- 
ment Y G G P3 . We define F i4 {L) = where i G [/] and b G {0. 1}. It outputs 

the master secret key MK = (j3 , Y) and the public parameters as 

pp= ([N,G,G T ,e),g = gu w, »/.i H-t- = e (g>g) P )- 

CDE.GenKey (L,MK,PP): This algorithm takes as input a label string L G {0, 1}", the 
master secret key MK, and the public parameters PP. It first selects a random ex- 
ponent rGZjy and random elements Yq,Yi,Y 2 \,. . . , >2„ G G P3 . It outputs a private 
key as 

SK L =(K 0 = gt } w- r Y 0 , K\ = g r Y \ , k: 2jl =F lji[1] (L|i) r y 2il , ..., K 2 , n = F„ An] (L|„) r y 2 ,„) . 

CDE.EncryptfL, y. s, PP): This algorithm takes as input a label string L G {0, 1 } d , a 
random exponent s G Z#, a vector 5 = (si , . . . , sj) cK of random exponents, and 
PP. It outputs a ciphertext header as 

CH L = (Cq = 8 s , Ci = C 2 , 1 = 8~ s \ -,C 24 = s"*) 

and a session key as EK = A S . 

CDE.DelegateCTfCY//, c, PP): This algorithm takes as input a ciphertext header 
CH; = (Co, . . . ,C 24 ) for a label string L G {0, l} d , a bit value c G {0, 1}, and PP. It 
selects a random exponent Sd + 1 € Zjv and outputs a delegated ciphertext header for 
the new label string L' = L\ \c as 

CH l , = (c 0 , c; = Ci • F d+liC (L , )^ +1 , C 2 J, .... C 24 , c? 24+l = g~ s ^). 

CDE.RandCT(C//z,, /, /, PP): This algorithm takes as input a ciphertext header 
CHl = (Co, . . . ,C 24 ) for a label string L G {0, 1 } d , a new random exponent s' G Z#, 
a new vector 5' = (s'j , . . . , and PP. It outputs a re-randomized ciphertext 

header as 

CH' l = (C' 0 = Co • /, Cl = Ci • v/ n^x M c 2,i = C2.1 • s"* 1 , • • • , 

v (=t 

C 2,4 = C 2,d ■ g ~ Sd ) • 

and a partial session key EK' =A S ' that will be multiplied with the session key EK 
of CHl to produce a re-randomized session key. 
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CDE.Decrypt (CH l ,SK l i,PP): This algorithm takes as input a ciphertext header CH L 
for a label string L£ {0, l} d , a private key SK l i foralabel string L' £ {0, 1 }", and PP. 
If L is a prefix of L' , then it computes CH' L , = (C' 0 , . . . , C' 2 „) by running DelegateCT 
and outputs a session key as EK = e(C' 0 ,Ko ) ■ e(C\ ,K \ ) ■ n'-i e(C' 2 i,K2.i)- Other- 
wise, it outputs _L. 

Let y/ be a mapping from time T to a label 41- Our SUE scheme that uses our CDE 
scheme as a building block is described as follows: 

SUE.InitlU): This algorithm outputs GDS by running CDE.InitfU). 
SUE.Setup(GDS, T max ): This algorithm outputs MK and PP by running CDE.Setup 

( GDS.dmax ) where W = 2 d "'°* +1 - 1. 

SUE.GenKey(T, PP): This algorithm outputs SKj by running CDE.GenKey 
(■ y{T),MK,PP ). 

SUE-Encryptfr^jEE): This algorithm takes as input a time T, a random exponent 
s £ Z/v, and PP. It proceeds as follows: 

1. It first sets a label string L £ {0 , \} d by computing y f(T). It sets an exponent 
vector s = (s\, ... ,Sd) by selecting random exponents s \ , . . . ,Sd £ Zy, and ob- 
tains C7/(°) by running CDE.EncryptfL.s.y. PP). 

2. For 1 < / < d, it sets £w = L| d -| 1 1 and proceeds the following steps: 

(a) If LW = L\d-j+\, then it sets CffW as an empty one. 

(b) Otherwise, it sets a new exponent vector s' = (V, .... ■s' d _j + - l ) where 
Sj , . . . s' d _j are copied from s and ^-j+i ' s randomly selected in Zy since 
lP> and L have the same prefix string. It obtains CH^> = (C' Q , . . . ,C' 2 d-j+i) 
by running CDE.EncryptfL^ , s, s ' , PP) . It also prunes the redundant el- 
ements Cq.Cj i , . . . - C^d-j f rom CH^\ which are already contained in 

ch(°\ 

3. It removes all empty CH^) and sets CH T = . . . ,CH^) for 

some d' <d that consists of non-empty CH^K 

4. It outputs a ciphertext header as CHj and a session key as EK = A S . 
SUE.UpdateCTfC/// , T + 1 ,PP): This algorithm takes as input a ciphertext header 

CH r = ( CH (°) , . . . , CHf ® ) for a time T, a next time T + 1 , and PP. Let [0 be the 
label of CH^d ) . It proceeds as follows: 

1. If the length d of if® is less than d max , then it first obtains and 

CH l ( o)|h by running CDE.DelegateCT(C// (0 ) , c. PP) for all c £ {0,1} since 
CH L ifi) || 0 is the ciphertext header for the next time T + 1 by pre-order traversal. 
It also prunes the redundant elements in It outputs an updated ci- 
phertext header as CH T+l = = CH [( o )m .CH' (r ) = CH [M]]V CH ,(2 '> = 

CH( l \...,CH'( d+l '> =CHW). 

2 In a full binary tree, each node is associated with a unique time T by the pre-order traversal 
and a unique label L by the label assignment. Thus there exist a unique mapping function yr 
from a time T to a label L. 
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2. Otherwise, it copies the common elements in CH^> to CH l l> and simply re- 
move ch(°) since CH^ > is the ciphertext header for the next time T + 1 by pre- 
order traversal. It outputs an updated ciphertext header as CHj+i = (CH'W = 
=CHW). 

SUE.RandCT (CH T ,s' ,PP): This algorithm takes as input a ciphertext header CH T = 
(CH (0 > , . . . , Cff W ) for a time T, a new random exponent s' € Z#, and PP. Let 
be the label of CH^ and be the length of the label It proceeds as follows: 

1 . It first sets a vector / = . . . . , s ' d((yj ) by selecting random exponents s\ , s' d(l)) 

G Zn, and obtains CH ,((y > by running CDE.RandCT(C//^°j , s' , s ' , PP). 

2. For 1 < j < d, it sets a new vector s" = (jj , . . . , s ' d(Jj ) where jj , . . . s' d(Jj _ ^ are 
copied from s' and s' diJ) is randomly chosen in Z*r, and obtains CH by run- 
ning CDE.RandCT (CHty PP) . 

3. It outputs a re-randomized ciphertext header as CH' r = (CH' (0 h . . . ,CH ,(d) ) 
and a partial session key as EK' =A S ' that will be multiplied with the session 
key EK of CHj to produce a re-randomized session key. 

SUE.DecrypttCY// , SK t i , PP) : This algorithm takes as input a ciphertext header CH T , 
a private key SK r i, and PP. If T < T' , then it finds CH { 'd from CH T such that is 
aprefix of L' = \jf(T') and outputs EK by running CDE.DecryptfCW , SK r > . PP) . 
Otherwise, it outputs _L. 

Remark 4. The ciphertext delegation (or update) algorithm of CDE (or SUE) just out- 
puts a valid ciphertext header. However, we can easily modify it to output a ciphertext 
header that is identically distributed with that of the encrypt algorithm of CDE (or SUE) 
by applying the ciphertext randomization algorithm. 

3.6 Correctness 

In CDE, if the label string L of a ciphertext is a prefix of the label string L' of a private 
key, then the ciphertext can be changed to a new ciphertext for the label string L' by us- 
ing the ciphertext delegation algorithm. Thus the correctness of CDE is easily obtained 
from the following equation. 

e(C 0 ,K 0 ) ■ e{C x ,K X )- K 2 .i) 

= e(g s ,gPw- r Y 0 ) • e(w s f[F um (L\ i y>,g r Y 1 ) ■fle(g- s ‘,F im (L\ i ) r Y 2t i) 

= e(g s ,g P ) ■ e(g s ,w~ r ) ■ e(w s ,g r ) = e(g,g) ps 

The SUE ciphertext header of a time T consists of the CDE ciphertext headers 
CH(°\CH( 1 \ . . . ,CHW that are associated with the nodes in TimeNodes(v). If the 
SUE private key of a time T' associated with a node v' satisfies T < T\ then we can 
find a unique node V such that TimeNodes(v) flPath(v') = v" since the property of the 
pre-order tree traversal. Let CH" be the CDE ciphertext header that is associated with 
the node v" . The correctness of SUE is easily obtained from the correctness of CDE 
since the label string L" of CH" is a prefix of the label string L' of the private key. 
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In CDE, the output of CDE.DelegateCT is a valid ciphertext header since the func- 
tion Fd + i )C (L') is used with a new random exponent Sd+ \ for the new label string L! 
with depth d + 1. The output of CDE.RandCT is statistically indistinguishable from 
that of CDE.Encrypt since it has a random exponent s" = s + s' and a random vector 
s" = (si + Sj , . . . , Sd + s' d ) where v , v i , . . . .,s>/ are original values in the ciphertext header 
and s' , s \ , . . . , s' d are newly selected random values. 

In SUE, the output of SUE.UpdateCT is a valid ciphertext header since the out- 
put of CDE.DelegateCT is a valid ciphertext header and the CDE ciphertext headers 
CH ^> , . . . CffW are still associated with the nodes in TimeNodes(v) where v is a node 
for the time T + 1. The output of SUE.RandCT is statistically indistinguishable from 
that of the encryption algorithm since new random exponents s', Sj , . . . , ,^ (0) are chosen 
and these random exponents are reused among the CDE ciphertext headers. 


3.7 Security Analysis 

Theorem 1. The above SUE scheme is fully secure under a chosen plaintext attack 
if Assumptions 1, 2, and 3 hold. That is, for any PPT adversary A, we have that 
AdVjf E (X) < Adv'g (X) +2 qAdvg 2 (X) +Advg 3 (X) where q is the maximum number 
of private key queries of A 

The proof of this theorem is given in the full version of this paper m. 

4 Revocable-Storage Attribute-Based Encryption 

4.1 Definitions 

Revocable-storage attribute-based encryption (RS-ABE) is attribute-based encryption 
(ABE) that additionally supports the revocation functionality and the ciphertext update 
functionality. Boldyreva, Goyal, and Kumar introduced the concept of revocable ABE 
(R-ABE) that provides the revocation functionality 0, and Sahai, Seyalioglu, and Wa- 
ters introduced the concept of RS-ABE that provides the ciphertext update functionality 
in R-ABE If29l. 

Definition 4 (Revocable-Storage Attribute-Based Encryption). A revocable-storage 
(ciphertext-policy) attribute-based encryption (RS-ABE) scheme consists of seven PPT 
algorithms Setup, GenKey, UpdateKey, Encrypt, UpdateCT, RandCT, and Decrypt, 
which are defined as follows: 

Setup(\ k ,U, T max ,N max ). The setup algorithm takes as input a security parameter 1 \ 
the universe of attributes IA, the maximum time T max , and the maximum number of 
users N inax , and it outputs public parameters PP and a master secret key MK. 

GenKey (S, u,MK,PP). The key generation algorithm takes as input a set of attributes 
S CIA, a user index u G Af, the master secret key MK, and the public parameters 
PP, and it outputs a private key SK$ U . 

UpdateKey (T ,R,MK,PP). The key update algorithm takes as input a time T < T max , a 
set of revoked users R Af, the master secret key MK, and the public parameters 
PP, and it outputs an update key U K-rp. 
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Encrypt) A, T,M,PP). The encryption algorithm takes as input an access structure A, 
a time T < T max , a message M, and the public parameters PP, and it outputs a 
ciphertext CT kT . 

UpdateCTfCT/j , T + 1 , PP). The ciphertext update algorithm takes as input a cipher- 
text CT k j for an access structure A and a time T, a new time T I 1 such that 
T + 1 < T max , and the public parameters PP, and it outputs an updated ciphertext 
CT kr+ \. 

RandCT(CT& T,PP). The ciphertext randomization algorithm takes as input a cipher- 
text CTfi j for an access structure A and a time T, and the public parameters PP, 
and it outputs a re-randomized ciphertext CT £ T . 

Decrypt) CT k j , SK ku , U K r / R , PP). The decryption algorithm takes as input a cipher- 
text CT kkr , a private key SKs, u > an update key UK r i R , and the public parameters 
PP, and it outputs a message M or the distinguished symbol _L. 

The correctness property of RS- ABE is defined as follows: For all PP,MK generated 
by Setup, all S and u, any SK$ tU generated by GenKey, all A, T, and M, any CT^j 
generated by Encrypt or UpdateCT, all T' and R, any U Kp R generated by UpdateKey, 
it is required that: 

- If(S eA)A(ufR)A(T< T'), then Decrypt (CT k T ,SK SM , UK Vr .PP) = M. 

- If (5 £ A) V {u G /?) V (T' < T), then Decry pt(CT k T , SK S u , U K r , R , PP) =_L with 
all but negligible probability. 

Additionally, it requires that the ciphertext distribution ofRandCT is statistically equal 
to that of Encrypt. 

Definition 5 (Security). The security property for RS-ABE is defined in terms of the 
indistinguishability under a chosen plaintext attack (IND-CPA ). The security game for 
this property is defined as the following game between a challenger C and a PPT ad- 
versary A: 

1. Setup: C runs Setup to generate the public parameters PP and the master secret 
key MK, and it gives PP to A. 

2. Query 1 : A may adaptively request a polynomial number of private keys and update 
keys. C proceeds as follows : 

- If this is a private key query for a set of attributes S and a user index u, then it gives 
the corresponding private key SKs. u to A by running GenKey(S,u,MK,PP). 
Note that the adversary is allowed to query only one private key for each user 

- If this is an update key query for an update time T and a set of revoked 
users R, then it gives the corresponding update key U K k R to A by running 
UpdateKey (T. R.MK , PP). Note that the adversary is allowed to query only 
one update key for each time T. 

3. Challenge: A outputs a challenge access structure A*, a challenge time T* , and 
challenge messages M* {i , M\ e M. of equal length subject to the following restric- 
tion: 

- It is required that (Si £ A*) V («,■ G Rj) V (7) < T*) for all of private 

key queries and all {(Tj.Rj)} of update key queries. 
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C chooses a random bit b and gives the ciphertext CT* to A by running Encrypt (A* , 
T*,M* b ,PP). 

4. Query 2: A may continue to request private keys and update keys subject to the 
same restrictions as before, and C gives the corresponding private keys and update 
keys to A. 

5. Guess: Finally A outputs a bit b'. 

The advantage of A is defined as Adv^~ ABE (f) = | Pr[b = b'] — j \ where the probability 
is taken over all the randomness of the game. A RS-ABE scheme is fully secure under a 
chosen plaintext attack if for all PPT adversaries A, the advantage of A in the above 
game is negligible in the security parameter X. 

Remark 5. In the above security game, it is not needed to explicitly describe Upda- 
teCT since the adversary can run UpdateCT to the challenge ciphertext by just using 
PP. Note that the use of UpdateCT does not violate the security game because of the 
restrictions in the game. 

4.2 Construction 

For our RS-ABE scheme, we use the (ciphertext-policy) ABE scheme of Lewko et al. 
ED as a primary encryption scheme with slight modifications. That is, we use the key 
encapsulation mechanism version of CP- ABE and the encryption algorithm additionally 
takes input a random exponent for a session key. The detailed description of CP- ABE is 
given in the full version of this paper m. Our RS-ABE scheme is described as follows: 

RS-ABE.Setup( fl. T max , N max ) : This algorithm takes as input a security parameter 
\ A , the universe of attributes U, the maximum time T max , and the maximum number 
of users N max . 

1. It first generates bilinear groups G,Gj of composite order N = p\ popj where 
P\,P2, and ps are random primes. Let g\ be the generator of G Pl . It sets GDS = 
((N,G,G T ,e),g u pup 2 ,P3). 

2. It obtains MKabe,PPabe and MKsue,PPsue by running ABE.Setup (GDS,W) 
and SUE.Setup(GDS, T inax ) respectively. It also obtains BT by running 
CS.Setup(/V m flt) and assigns a random exponent y, G Zn to each node v,- in 
BT. 

3 . It selects a random exponent agZjv, and then it outputs MK = (. MKabe , MKsue , 
a,BT) and PP - ( PPabe,PPsue, 8 = gh& = e{g,g) a ). 

RS-ABE. GenKeyl.S', u,MK, PP): This algorithm takes as input a set of attributes S, a 
user index u, MK = (MK A be,MK S ue,oc,BT), and PP. 

1 . It first obtains PV U = {Sj 0 , Sj d } by running CS. Assign (GT, u) and retrieves 
{ y /0 , . . . , Yj d } from BT where y,- is assigned to the node v,-. 

2. For 0 < k < d, it sets MK' ABE = ( Jj k ,Y ) and obtains SKabe,Ic by running 
ABE.GenKey (S,MK' abe ,PP AB e). 

3. It outputs SKs „ = ( PV u ,SKabe,o> ■ ■ ■ > SKabe 4 ) ■ 

RS-ABE.UpdateKey(7 . R.MK, PP): This algorithm takes as input an update time T, 

a set of revoked users R, MK, and PP. 
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1 . It first obtains CVr = {S,-, , . . . . 5,- m } by running CS.Cover [BT, R) and retrieves 

{Yh i • • • > Jim } f rom BT. 

2. For 1 < k < m, it sets MK' SUE = [a — Ji k ,Y) and obtains SK$uE,k by running 
SUE. GenKey ( T, MK’ sue , PPsue ) ■ 

3. It outputs U Kjjt = (CVr,SK SUE \ , . . . , SK SUE m ) ■ 

RS- ABE.Encrypt( A, T,M,PP): This algorithm takes as input an LSSS access struc- 
ture A, a time T, a message M, and PP. It selects a random exponent seZjy and ob- 
tains CHabe and CHsue by running ABE.EncryptfA. ,y. PPabe) and SUE.Encrypt 
(T,s, PP SUE ) respectively. It outputs as CT kJ = (CH ABE , CH SUE C = £2 S ■ M). 
RS-ABE. UpdateCT(C7//,T + 1 .PP): This algorithm takes as input a ciphertext 
CTf, T = ( CHabe ■ CHsue ■ C) for an LSSS access structure A and a time T, a new time 
T + 1 , and PP. It obtains CH' SUE by running SUE.UpdateCT(C//y(/ K ,T + 1 , PPsue). 
It outputs CT k j + 1 = (CHabe, CH' sue ,C). 

RS- A B E. Rand CT(C7/j / , PP) : This algorithm takes as input a ciphertext CT A r = 
[CHabe, CHsue, C) and PP. It first selects a random exponent s' £ It obtains 
CH' abe and CH' SUE by running ABE.RandCT [CHabe , s' , PPabe ) and SUE.RandCT 
[CHsue , s ' , PPsue ) , respectively. It outputs CT' k;r = (CH ' ABE , CH' SUE ,C' =CQ S '). 
RS- ARE. Decry pt(CT k r , SKs^u, U K t i r , PP): This algorithm takes as input a cipher- 
text CTf.j = [CHabe, CHsue , C), a private key SKs u = [ PV u ,SKabe,o , ■■■> SK A BE,d), 
an update key UK t i R = (CVr,SK S ue. \ ■ ■ ■ ■ ,SK S uE,m ), and PP. 

1. If u £ R, then it obtains ( Si,Sj ) by running CS.Match(CVff,/ > V r „). Otherwise, it 
outputs _L. 

2. If S £ A and T < T', then it can obtain EKabe and EKsue by running 
ABE.Decrypt(C///i/j/, , SKabej , PPabe ) and SUE.Decrypt [CHsue , SKsuE,i , 
PPsue ) respectively and outputs M by computing C- (EKabe-EKsue) ' ■ Oth- 
erwise, it outputs X, 

Remark 6. The ciphertext update algorithm of our scheme just outputs a valid updated 
ciphertext since a past ciphertext will be erased in most applications. However, the 
definition of Sahai et al. El requires that the output of UpdateCT should be equally 
distributed with that of Encrypt. Our scheme also can meet this strong requirement by 
applying RandCT to the output of UpdateCT. 

Theorem 2. The above RS-ABE scheme is fully secure under a chosen plaintext at- 
tack if Assumptions 1, 2, and 3 hold. That is, for any PPT adversary A, we have that 
Adv R A ABE (A ) < Adv g 1 [X) + 0[q)- Adv^ 2 (A ) +Advg 3 (A ) where q is the maximum num- 
ber of private key and update key queries of A. 

The proof of this theorem is given in the full version of this paper m. 

4.3 Discussions and RS-PE Results 

Efficiency. In our RS-ABE scheme, the number of group elements in a ciphertext is 
2Z + 31og7j nra where l is the row size of an access structure. In the RS-ABE scheme 
of Sahai et al. m, the number of group elements in a ciphertext is 21og7j„ flJC • [l + 
21og7 mM ) since a piecewise CP- ABE scheme was used. 
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Revocable-Storage Predicate Encryption. If we use the PE scheme of Park ||26l as a 
primary encryption scheme, then we can build an RS-PE scheme in prime order bilinear 
groups that additionally supports attribute-hiding property. The definition, construction, 
and proof of RS-PE are given in the full version of this paper na. 

Theorem 3. The RS-PE scheme is selectively secure under a chosen plaintext attack if 
the DBDH and the DLIN assumptions hold. That is, for any PPT adversary A, we have 
thatAdv R J- pE {l)<2Adv^ UN {l)+Adv^ BDH {X). 
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Abstract. Boneh, Raghunathan, and Segev (CRYPTO T3) have re- 
cently put forward the notion of function privacy and applied it to 
identity-based encryption, motivated by the need for providing predi- 
cate privacy in public-key searchable encryption. Intuitively, their notion 
asks that decryption keys reveal essentially no information on their cor- 
responding identities, beyond the absolute minimum necessary. While 
Boneh et al. showed how to construct function-private identity-based 
encryption (which implies predicate-private encrypted keyword search), 
searchable encryption typically requires a richer set of predicates. 

In this paper we significantly extend the function privacy framework. 
First, we consider the notion of subspace-membership encryption, a gen- 
eralization of inner-product encryption, and formalize a meaningful and 
realistic notion for capturing its function privacy. Then, we present a 
generic construction of a function-private subspace-membership encryp- 
tion scheme based on any inner-product encryption scheme. This is 
the first generic construction that yields a function-private encryption 
scheme based on a non-function-private one. 

Finally, we present various applications of function-private subspace- 
membership encryption. Among our applications, we significantly im- 
prove the function privacy of the identity-based encryption schemes of 
Boneh et al.: whereas their schemes are function private only for iden- 
tities that are highly unpredictable (with min-entropy of at least A + 
tu(logA) bits, where A is the security parameter), we obtain function- 
private schemes assuming only the minimal required unpredictability 
(i.e., min-entropy of only o»(log A) bits). This improvement offers a much 
more realistic function privacy guarantee. 

Keywords: Function privacy, functional encryption. 


1 Introduction 

Predicate encryption systems [13123] are public-key schemes where a single public 
encryption key has many corresponding secret keys: every secret key corresponds 

* The full version is available as Cryptology ePrint Archive, Report 2013/403 HD. 

** This work was done while the author was visiting Stanford University. 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 255- F751 2013. 

© International Association for Cryptologic Research 2013 


256 


D. Boneh, A. Raghunathan, and Gil Segev 


to a predicate p : £ — > {0, 1} where £ is some pre-defined set of indices (or 
attributes) . Plaintext messages are pairs ( x , m) where x e £ and m is in some 
message space. A secret key sk p for a predicate p has the following semantics: if 
c is an encryption of the pair (x, m) then sk p can be used to decrypt c only if the 
“index” x satisfies the predicate p. More precisely, attempting to decrypt c using 
skp will output to if p{x) = 1 and output _L otherwise. A predicate encryption 
system is secure if it provides semantic security for the pair (x, to) even if the 
adversary has a few benign secret keys. 

The simplest example of predicate encryption is a system supporting the set 
of equality predicates, that is, predicates py : £ — > {0, 1} defined as p\&{x) = 1 
iff x = id. In such a system there is a secret key skid for every id £ £ and given 
the encryption c of a pair (x, to) the key sky can decrypt c and recover m only 
when x = id. It is easy to see that predicate encryption for the set of equality 
predicates is the same thing as (anonymous) identity-based encryption [ 811 ] . 

Currently the most expressive collusion-resistant predicate encryption sys- 
tems |23I3| support the family of inner product predicates: for a vector space 
£ = ¥ e q this is the set of predicates p v : £ — > {0, 1} where v G £ and p v (x) = 1 
iff X-Lv. This family of predicates includes the set of equality predicates and 
others. 

Searching on Encrypted Data. Predicate encryption systems provide a gen- 
eral framework for searching on encrypted data. Consider a mail gateway whose 
function is to route incoming user email based on characteristics of the email. For 
example, emails from “boss” that are marked “urgent” are routed to the user’s 
cell phone as are all emails from “spouse.” All other emails are routed to the 
user’s desktop. When the emails are transmitted in the clear the gateway’s job is 
straight forward. However, when the emails are encrypted with the user’s public 
key the gateway cannot see data needed for the routing decision. The simplest 
solution is to give the gateway the user’s secret key, but this enables the gateway 
to decrypt all emails and exposes more information than the gateway needs. 

A better solution is to encrypt emails using predicate encryption. The email 
header functions as the index x and the the routing instructions are used as to. 
The gateway is given a secret key sk p corresponding to the “route to cell phone” 
predicate. This secret key enables the gateway to learn the routing instructions 
for messages satisfying the predicate p, but learn nothing else about emails. 

Function Privacy. A limitation of many existing predicate encryption systems 
is that the secret key sk p reveals information about the predicate p. As a result, 
the gateway, and anyone else who has access to sk p , learns the predicate p. Since 
in many practical settings it is important to keep the predicate p secret, our goal 
is to provide function privacy : sk p should reveal as little information about p as 
possible. 

At first glance it seems that hiding p is impossible: given sk p the gateway can it- 
self encrypt messages (x, to) and then apply sk p to the resulting ciphertext. In doing 
so the gateway learns if p(x) = 1 which reveals some information about p. Never- 
theless, despite this inherent limitation, function privacy can still be achieved. 
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Towards a Solution. In recent work Boneh, Raghunathan, and Segev [TO] 
put forward a new notion of function privacy and applied it to identity-based 
encryption systems (i.e. to predicate encryption supporting equality predicates). 
They observe that if the identity id is chosen from a distribution with super- 
logarithmic min-entropy then the inherent limitation above is not a problem since 
the attacker cannot learn id from sky by a brute force search since there are too 
many potential identities to test. They define function privacy for IBE systems 
by requiring that when id has sufficient min-entropy then sky is indistinguishable 
from a secret key derived for an independently and uniformly distributed identity. 
This enables function private keyword searching on encrypted data. They then 
construct several IBE systems supporting function-private keyword searching. 

While Boneh et al. [ID] showed how to achieve function privacy for equality 
predicates, encrypted search typically requires a richer set of searching predi- 
cates, including conjunctions, disjunctions, and many others. The authors left 
open the important question of achieving function privacy for a larger family of 
predicates. 

Our Contributions. In this paper we extend the framework and techniques of 
Boneh et al. [lOj for constructing function-private encryption schemes. We put 
forward a generalization of inner-product predicate encryption [2311813] . which 
we denote subspace-membership encryption, and present a definitional frame- 
work for capturing its function privacy. Our framework identifies the minimal 
restrictions under which a strong and meaningful notion of function privacy can 
be obtained for subspace- membership encryption schemes. 

Then, we present a generic construction of a function-private subspace mem- 
bership encryption scheme based on any underlying inner-product encryption 
scheme (even when the underlying scheme is not function private). Our construc- 
tion is efficient, and in addition to providing function privacy, it preserves the 
security properties of the underlying scheme. This is the first generic construction 
that yields a function-private encryption scheme based on a non-function-private 
one. Recall that even for the simpler case of identity-based encryption, Boneh et 
al. m were not able to provide a generic construction, and had to individually 
modify various existing schemes. 

Finally, we present various applications of function-private subspace mem- 
bership encryption (we refer the reader to Section 11.11 for an overview of these 
applications). Among our applications, we significantly improve the function 
privacy of the identity-based encryption schemes of Boneh et al. [TO]. Specifi- 
cally, whereas their schemes guarantee function privacy only for identity distri- 
butions that are highly unpredictable (with min-entropy of at least A + w(logA) 
bits, where A is the security parameter), we construct schemes that guarantee 
function privacy assuming only minimal unpredictability (i.e., min-entropy of 
a; (log A) bits). This improvement presents a much more realistic function pri- 
vacy guarantee. 
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l. 1 Overview of Our Contributions 

A subspace-membership encryption scheme is a predicate encryption scheme 
supporting subspace- membership predicates. That is, an encryption of a message 
is associated with an attribute xeS f , and secret keys are derived for subspaces 
defined by all vectors in 8^ orthogonal to a matrix W e 8 mx ^ (for integers 

m, l e N and an additive group S)l^| Decryption recovers the message iff W • x = 
0. We refer the reader to El Section 2.3] for the standard definitions of the 
functionality and data security of predicate encryption (following |23I3| 1. 
Function Privacy for Subspace-Membership Encryption. Our goal is to 
design subspace- membership encryption schemes in which a secret key, skw, 
does not reveal any information, beyond the absolute minimum necessary, on 
the matrix W. Formalizing a realistic notion of function privacy, however, is not 
straightforward due to the actual functionality of subspace-membership encryp- 
tion encryption. Specifically, assuming that an adversary who is given a secret 
key skw has some a-priori information that the matrix W belongs to a small set 
of matrices (e.g., {Wo,Wi}), then the adversary may be able to fully recover 
W : The adversary simply needs to encrypt a (possibly random) message m for 
some attribute x that is orthogonal to Wo but not to Wi, and then run the 
decryption algorithm on the given secret key skw and the resulting ciphertext 
to identify the one that decrypts correctly. In fact, as in [TU] , as long as the 
adversary has some a-priori information according to which the matrix W is 
sampled from a distribution whose min-entropy is at most logarithmic in the 
security parameter, there is a non-negligible probability for a full recovery. 

In the setting of subspace-membership encryption (unlike that of identity- 
based encryption mi however, the requirement that W is sampled from a 
source of high min-entropy does not suffice for obtaining a meaningful notion 
of function privacy. In Section [5] we show that even if W has nearly full min- 
entropy, but two of its columns may be correlated, then a meaningful notion of 
function privacy is not within reach. 

In this light, our notion of function privacy for subspace-encryption schemes 
focuses on secret key skw for which the columns of W form a block source. That 
is, each column of W should have a reasonable amount of min-entropy even given 
all previous columns. Our notion of function privacy requires that such a secret 
key skw (where W is sampled from an adversarially-chosen distribution) be 
indistinguishable from a secret key for a subspace chosen uniformly at random. 
A Function-Private Construction from Inner-product Encryption. 
Given any underlying inner-product encryption scheme we construction a function- 
private subspace-membership encryption scheme quite naturally. We modify the 
key-generation algorithm as follows: for generating a secret key for a subspace de- 
scribed by W, we first sample a uniform s S m and use the key-generation algo- 
rithm of the underlying scheme for generating a secret key for the vector v = W T s . 
Observe that as long as the columns of W form a block source, then the leftover 

1 Note that by setting m = 1 one obtains the notion of an inner-product encryption 

scheme |23I18I3| . 
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hash lemma for block sources guarantees that v is statistically close to uniform. In 
particular, essentially no information on W is revealed. 

We also observe that extracting from the columns of W using the same seed for 
the extractor (s, ■} interacts nicely with the subspace-membership functionality. 
Indeed, if W • x = 0, it holds that v T x = 0 and vice-versa with high probability. 
We note that the case where the attribute set is small requires some additional 
refinement that we omit from this overview, and we refer the reader to Section 
[3] for more details. 

Application 1: Function Privacy When Encrypting to Roots of Poly- 
nomials. We consider predicate encryption schemes supporting polynomial eval- 
uation where secret keys correspond to polynomials p G §[X] and messages are 
encrypted to an attribute x G S. Given a secret key sk p and a ciphertext with 
an attribute x, decryption recovers the message iff p(x) evaluates to 0. Our work 
constructs such schemes from any underlying subspace-membership scheme. 

We also explore the notion of function privacy for such polynomial encryp- 
tion schemes. We require that secret keys for degree-d polynomials p(x) with 
coefficients (po,pi,. . . , p,i) £ S d+1 coming from a sufficiently unpredictable ad- 
versarially chosen (joint) distribution be indistinguishable from secret keys for 
degree- ci polynomials where each coefficient is sampled uniformly from the un- 
derlying set. Unlike the case of subspace membership, we do not restrict our 
security to those distributions whose unpredictability holds even when condi- 
tioned on all previous (i.e., here we obtain security for any min-entropy source 
and not only for block sources). 

Our function-private construction maps attributes x to Vandermonde vectors 
x = (1, x, x 2 , . . .) and a polynomial p(x) to a subspace W as follows. We sample 
d+ 1 polynomials r\{x ), . . . , r,i+i (x) in a particular manner (as a product of d 
uniformly random linear polynomials) and construct the subspace W whose i th 
row comprises the coefficients of p{x) ■ ri(x). In section I I . 1 1 we elaborate on the 
details and prove that our choice of randomizing polynomials allows us to show 
that for polynomials whose coefficients come from an unpredictable distribution, 
W’s columns have conditional unpredictability. And similarly, for polynomials 
with uniformly distributed coefficients, W’s columns are uniformly distributed. 
This allows us to infer the function privacy of the polynomial encryption scheme 
from the function privacy of the underlying subspace-membership encryption 
scheme. 

Application 2: Function-Private IBE with Minimal Unpredictability. 

As another interesting application of predicate encryption supporting polyno- 
mial evaluation, we consider the question of constructing function-private IBE 
schemes whose function privacy requires only the minimal necessary unpre- 
dictability assumption. It is easy to see (and as was shown in [TO]) that if 
the adversary has some a-priori information according to which identities are 
sampled from a distribution with only logarithmic bits of entropy, then a simple 
adversary recovers id from sky with non-negligible probability by simply encrypt- 
ing a messages to a guessed id and checking if decryption recovers the messages 
successfully. 
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Their constructions use a technique of preprocessing the id with a randomness 
extractor to recover idExt that is statistically close to uniform and thus hides any 
information about the underlying distribution of identities. As the extracted 
identity is roughly A bits long, the distribution of identities must have min- 
entropy at least A+w(log A) bits to guarantee that extraction works. The identity 
space is much larger and this is still a meaningful notion of function privacy 
but the question of designing schemes that require the minimal min-entropy of 
a; (log A) bits was left open. 

Starting from encryption schemes supporting polynomial evaluation (for our 
construction, linear polynomials suffice), this work shows how to construct 
function-private IBE schemes with the only restriction on identities being that 
they are unpredictable. We consider identities in a set S and consider a polynomial 
P\i(x) = (x— id). By first randomizing the polynomial with uniformly chosen r in 
§, we observe that if id has the minimal super-logarithmic unpredictability, then 
the coefficients of the polynomial r • {x — id) have sufficient unpredictability. Thus, 
considering polynomial encryption schemes where secret keys correspond to such 
polynomials and attributes correspond to x = id, we construct IBE schemes that 
are function private against distributions that only have the minimum necessary 
unpredict ability. 

1.2 Related Work 

As discussed above, the notion of function privacy was recently put forward 
by Boneh, Raghunathan, and Segev m- One of the main motivations of 
Boneh et al. was that of designing public-key searchable encryption schemes 
[8l20llll3l28ll23l5ll4l2l3j that are keyword private. That is, public-key searchable 
encryption schemes in which search tokens hide, as much as possible, their corre- 
sponding predicates. They presented a framework for modeling function privacy, 
and constructed various function-private anonymous identity-based encryption 
schemes (which, in particular, imply public-key keyword-private searchable en- 
cryption schemes). 

More generally, the work of Boneh et al. initiated the study of function pri- 
vacy in functional encryption [121261612114119] . where a functional secret key 
skf corresponding to a function / enables to compute f(m) given an encryption 
c = EnCpk(m). Intuitively, in this setting function privacy guarantees that a func- 
tional secret key skf does not reveal information about / beyond what is already 
known and what can be obtained by running the decryption algorithm on test 
ciphertexts. In [10], the authors also discuss connections of function privacy to 
program obfuscation. 

Our notion of subspace-membership encryption generalizes that of inner- 
product encryption introduced by Katz, Sahai, and Waters [ 23 ]. They defined 
and constructed predicate encryption schemes for predicates corresponding to 
inner products over Z N (for some large N). Informally, this class of predicates 
corresponds to functions / v where / v (x) = 1 if and only if (v, x) = 0. Subse- 
quently, Freeman m modified their construction to inner products over groups 
of prime order p, and Agrawal, Freeman, and Vaikuntanathan [ 3 ] constructed an 
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inner-product encryption scheme over Z p for a small prime p. Other results on 
inner product encryption study adaptive security delegation in the context 
of hierarchies m, and generalized IBE [9]. 

Finally, we note that function privacy in the symmetric- key setting, where 
the encryptor and decryptor have a shared secret key, was studied by Shen, 
Shi, and Waters m They designed a function-private inner-product encryption 
scheme. As noted by Boneh et al. [TU] , achieving function privacy in the public- 
key setting is a more subtle task due to the inherent conflict between privacy 
and functionality. 

1.3 Notation 

For an integer n £ N we denote by [n] the set {1, . . . , n}, and by U n the uniform 
distribution over the set {0, 1}". For a random variable X we denote by x <— X 
the process of sampling a value x according to the distribution of X. Similarly, 
for a finite set S we denote by x S the process of sampling a value x according 
to the uniform distribution over S. We denote by x (and sometimes x) a vector 
(«l, . . . , :i'| x |)- We denote by X = (Xj, . . . , X T ) a joint distribution of T random 
variables. A non-negative function / : N — > R. is negligible if it vanishes faster 
than any inverse polynomial. A non-negative function / : N — > R. is super- 
polynomial if it grows faster than any polynomial. 

The min-entropy of a random variable X is Hoc (X) = — log(maxa, Pr[X = a;]). 
A k-source is a random variable X with Hoo(X) > k. A ( T,k)-block source is 
a random variable X = (Xi, . . . , Xt) where for every i G ffij and X \, . . . , Xj_i 
it holds that H 00 (Xj|Xi = »i, . . . , X*_i = xi-\) > k. The statistical distance 
between two random variables X and Y over a finite domain S2 is SD(X, Y) = 
\ Suen I P r [^ = w] — Pr[F = uj] |. Two random variables X and Y are 5-close 
if SD(X,Y) < S. Two distribution ensembles {X^Iagn and {YaIagn are sta- 
tistically indistinguishable if it holds that SD(Xa,Ya) is negligible in A. They 
are computationally indistinguishable if for every probabilistic polynomial-time 
algorithm A it holds that |Pr[_4(l A ,a:) = l] — Pr[A(l A ,y) = l] | is negligible in 
A, where x <— X\ and y <—Y\. 

1.4 Paper Organization 

The remainder of this paper is organized as follows. Due to space constraints, 
we refer the reader to the full version El Section 2] for standard definitions 
and tools. In Section [2] we introduce the notions of subspace-membership en- 
cryption and function privacy for subspace-membership encryption. In Section 
[3] we present a generic construction of a function-private subspace-membership 
encryption scheme based on any inner-product encryption scheme. In Section H] 
we present various applications of function-private subspace-membership encryp- 
tion. In Section [5] we discuss several open problems that arise from this work. 
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2 Subspace-Membership Encryption and Its Function 
Privacy 

In this section we formalize the notion of subspace-membership encryption and 
its function privacy within the framework of Boneh, Raghunathan and Segev [10] . 
A subspace-membership encryption scheme is a predicate encryption scheme 
[T3l23| supporting the class of predicates T, over an attribute space S = §*, 
defined as 




for integers m, l £ N, and an additive group §. Informally, in a subspace- 
membership encryption, an encryption of a message is associated with an at- 
tribute x G §^, and secret keys are derived for subspaces defined by all vectors 
in orthogonal to a matrix W £ § TOX ^. Decryption recovers the message if 
and only if W • x = 0. (See [Til Section 2.3] for the standard definitions of the 
functionality and data security of predicate encryption.) Subspace-membership 
encryption with delegation was also studied in [24125] . Here we do not need the 
delegation property. 

Based on the framework introduced by Boneh, Raghunathan, and Segev [TU] , 
our notion of function privacy for subspace-membership encryption considers 
adversaries that are given the public parameters of the scheme and can interact 
with a “real-or-random” function-privacy oracle RoR FP defined as follows, and 
with a key-generation oracle. 

Definition 2.1 (Real-or-random function- privacy oracle). The real-or- 
random function-privacy oracle RoR FP takes as input triplets of the form (mode, 
msk, V), where mode G {real, rand}, msk is a master secret key, and V = (Vi, . . . , 
Vf) G § mx ^ is a circuit representing a joint distribution over § mxe (i.e., each V) 
is a distribution over S m ). If mode = real then the oracle samples W g- V and 
if mode = rand then the oracle samples W «— S mx ^ uniformly. It then invokes 
the algorithm KeyGen(msk, ■) on W for outputting a secret key skw 

Definition 2.2 (Function-privacy adversary). An {I, k) -block- source 
function-privacy adversary A is an algorithm that is given as input a pair (1 A , pp) 
and oracle access to RoR FP (mode, msk, •) for some mode G {real, rand}, and to 
KeyGen(msk, •). It is required that each ofA’s queries to RoR Fp be an ( l , k)- block- 


source . 


Definition 2.3 (Function- private subspace- membership encryption) . A 

sub space-membership encryption scheme II = (Setup, KeyGen, Enc, Dec) is (I,k)~ 
block-source function private if for any probabilistic polynomial-time ((, k) -block- 
source function-privacy adversary A, there exists a negligible function v{\) such 
that 


A( Iv^ a (A) — |Pr^Exptpp l 77 _ A (A) — lj - Pr^Exptpp d n _4(A) — lj | < v(A), 
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where for each mode e {real, rand} and A e N the experiment Exptfp f A (X) is 
defined as follows: 

1. (pp, msk) <— Setup(l A ). 

2. b «- ^RoR FP (mode,msk,.),KeyGen(msk,.)( 1 A ) pp ) 

3. Output b. 

In addition, such a scheme is statistically (£, k) -block- source function private if 
the above holds for all computationally-unbounded (£, k) -block-source function- 
privacy adversary making a polynomial number of queries to the RoR FP oracle. 

We note here that a security model that allows the adversary to receive the 
master secret key msk in place of the oracle KeyGen(msk, •) leads to a seemingly 
stronger notion of function privacy. However, such a notion is subsumed by 
statistical function privacy and the schemes constructed in this paper actually 
satisfy this stronger notion. 

Multi-shot vs. Single-shot Adversaries. Note that Definition 12.31 considers 
adversaries that query the function-privacy oracle for any polynomial number of 
times. In fact, as adversaries are also given access to the key-generation oracle, 
this “multi-shot” definition is polynomially equivalent to its “single-shot” variant 
in which adversaries query the real-or-random function-privacy oracle RoR FP at 
most once. This is proved via a straightforward hybrid argument, where the 
hybrids are constructed such that only one query is forwarded to the function- 
privacy oracle, and all other queries are answered using the key-generation oracle. 

The Block-source Requirement on the Columns of W. Our definition of 
function privacy for subspace-membership encryption requires that a secret key 
skw reveals no unnecessary information about W as long as the columns of W 
form a block source (i.e., each column is unpredictable even given the previous 
columns). One might consider a stronger definition, in which the columns of 
W may be arbitrarily correlated, as long as each column of W is sufficiently 
unpredictable. Such a definition, however, is impossible to satisfy. 

Specifically, consider the special case of inner-product encryption (i.e., m = 1), 
and an adversary that queries the real-or-random oracle with a distribution over 
vectors we§* defined as follows: sample £—\ independent and uniform values 
Mi, ... , U (- 1 <— S and output w = u%, ... , m*_i). Such a distribution 

clearly has high min-entropy (specifically, {£— 1) log |S| bits), and each coordinate 
of w has min-entropy log |8| bits. However, secret keys for vectors drawn from 
this distribution can be easily distinguished from secret keys for vectors drawn 
from the uniform distribution over encrypt a message M to the attribute 
x = (—2, 1, 0, . . . , 0) G 8* and check to see if decryption succeeds in recovering 
M. For a random vector the decryption succeeds only with probability 

1/|S| giving the adversary an overwhelming advantage. 

Therefore, restricting function privacy adversaries to query the RoR FP oracle 
only with sources whose columns form block sources is essential for achieving a 
meaningful notion of function privacy. 
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On Correlated RoR FP Queries. In Definition 12. 21 we consider adversaries that 
receives only a single secret key skw for each query to the RoR FP oracle. Our 
definition easily generalizes to include adversaries that are allowed to query 
the RoR FP oracle with correlated queries. More specifically, an adversary can 
receive secret keys skwi , • ■ ■ , skw T for any parameter T that is polynomial in 
the security parameter. The RoR FP oracle samples subspaces Wi, . . . , W t from 
an adversarially chosen joint distribution over (§ mx ^) with the restriction that 
for every 1 < i < T, the columns of W., come from a (l. fc)-block-source even 
conditioned on any fixed values for Wi, . . . , Wj_iH 

Function Privacy of Existing Inner-product Encryption Schemes. The 

inner-product predicate encryption scheme from lattices [3] is trivially not func- 
tion private as the secret key includes the corresponding function / v as part of 
it (this is necessary for the decryption algorithm to work correctly). The scheme 
constructed from bilinear groups with composite order m however presents no 
such obvious attack, but we were not able to prove its function privacy based on 
any standard cryptographic assumption. 

3 A Generic Construction Based on Inner-Product 
Encryption 

In this section we present a generic construction of a function-private subspace- 
membership encryption scheme starting from any inner-product encryption 
scheme. Due to space constraints, we deal with a large attribute space S of 
size super-polynomial in the security parameter A here, and explain our idea of 
extending our construction to the case when |S| is small (see [HI Section 4.2] for 
the details). 

Our Construction. Let TP = (IP. Setup, IP.KeyGen, IP.Enc, IP. Dec) be an inner- 
product encryption scheme with attribute set £ = §*. We construct a subspace- 
membership encryption scheme SM. = (SM. Setup, SM.KeyGen, SM.Enc, SM.Dec) 
as follows. 

— Setup: SM. Setup is identical to IP. Setup. On input the security parameter 
it outputs public parameters pp and the master secret key msk by running 
IP.Setup(l A ). 

— Key generation: SM.KeyGen takes as input the master secret key msk 
and a function /w where W £ g mx( and proceeds as follows. It samples 
uniform s <— § m and computes v = W T s e §^. Next, it samples a secret key 
sk v <— IP.KeyGen(msk, v) and outputs skw == sk v . 

— Encryption: SM.Enc is identical to IP.Enc. On input the public parameters, 
an attribute x £ S f , and a message M, the algorithm outputs a ciphertext 
c 4— IP.Enc(pp, x, M). 

2 Or equivalently, the columns of [ Wi | W 2 | ••• Wt] are distributed according to 
a (T£, fc)-block-source. 
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- Decryption: SM.Dec is identical to IP. Dec. On input the public parameters 
pp, a secret key skvv, and a ciphertext c, it outputs M IP.Dec(pp, skw, c). 

Correctness. Correctness of the construction follows from the correctness of 
the underlying inner-product encryption scheme. For every W e §”' x/: and every 
x e B t , it suffices to show the following: 

- If /(/) = 1, then it holds that W • x = 0. This implies x T v = x T (W T s) = 0 
and therefore SM.Dec correctly outputs M as required. 

- If /(/) = 0, then it holds that e = W • x ^ 0 G S m . As x^v = X t (W^s) = 
e T s, for any e / 0 the quantity x T v is zero with probability 1/|S| over choices 
of s. As 1/|8| is negligible in A whenever |S| is super-polynomial in A, the 
proof of correctness follows. 

Security. We state the following theorem about the security of our construction. 

Theorem 3.1. IflP is an attribute hiding (resp. weakly attribute hiding) inner- 
product encryption scheme for an attribute set S of size super-polynomial in the 
security parameter, then it holds that: 

1. The schemeSM is an attribute hiding (resp. weakly attribute hiding) subspace- 
membership encryption scheme under the same assumption as the security of 
the underlying inner-product encryption scheme. 

2. The scheme SM when m > 2 is statistically function private for (£, k)- block- 
sources for any t = poly(A) and k > log |S| + w(log A). 

Proof. We first prove the attribute-hiding property of the scheme, and then 
prove its function privacy. 

Attribute Hiding. Attribute-hiding property of SM follows from the attribute- 
hiding property of IP in a rather straightforward manner. Given a challenger for 
the attribute-hiding property of IP, an SM adversary A can be simulated by al- 
gorithm B as follows: A’s challenge attributes are forwarded to the IP-challenger 
and the resulting public parameterers are published. Secret key queries can be 
simulated by first sampling uniform s <— S m , then computing v = W T s and 
forwarding v to the IP key generation oracle. Similarly, the challenge messages 
from the adversary are answered by forwarding them to the challenger. In the 
full version El Section 4.1], we elaborate on the details and show that if Q 
denotes the number of secret key queries by A, it holds that 

Advx P , B (A) > Adv s>M (A) - 2Q/|S|, (1) 

thus completing the proof of the attribute hiding property of SM . 

Function Privacy. Let A be a computationally unbounded (£, /c)-block-source 
function-privacy adversary that makes a polynomial number Q = Q( A) of queries 
to the RoR fp oracle. We prove that the distribution of A’s view in the exper- 
iment Exptpp SM is statistically close to the distribution of A’s view in the 
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experiment Exptpp ^^ A (we refer the reader to Definition 12.31 for the descrip- 
tions of these experiments). We denote these two distributions by View rea | and 
View ran d, respectively. 

As the adversary A is computationally unbounded, we assume without loss of 
generality that A does not query the KeyGen(msk, ■) oracle — such queries can be 
internally simulated by A. Moreover, as discussed in Section |2J it suffices to focus 
on adversaries A that query the RoR FP oracle exactly once. Prom this point on 
we fix the public parameters pp chosen by the setup algorithm, and show that 
the two distributions View rea i and View ranc | are statistically close for any such pp. 

Denote by V = (Vi, . . . , V}) the random variable corresponding to the (l. k)- 
source with which A queries the RoR FP oracle. For each i G [(], let (w^i , . . . , Wi. rn ) 
denote a sample from V- Also, let s = (si , . . . , s m ) € S rn . As A is computation- 
ally unbounded, and having fixed the public parameters, we can in fact assume 
that 

View mode = > • • • > ) ( 2 ) 

for mode € {real, rand}, where W = is drawn from V for mode = 

real, W is uniformly distributed over § mxe for mode = rand, and s t <— S for 
every i £ [£]. For mode G {real, rand} we prove that the distribution View mode is 
statistically close to a uniform distribution over § m . 

Note that the collection of functions {p Sl ,...,s m : S m — > §} S i,...,s m es defined 
by 9s (tor, . . . , w m ) = J2 T=i s i ' w i i® universal. This enables us to directly 
apply the Leftover Hash Lemma for block-sources |1 fil22l2?)l1 7| implying that 
for our choice of parameters m, l and k the statistical distance between View rea | 
and the uniform distribution is negligible in A0 The same clearly holds also for 
View rand , as the uniform distribution over § mx ^ is, in particular, a {i, fc)-block- 
source. This completes the proof of function privacy. 

I 

Theorem 13.11 for correlated RoR FP queries. Recollect that the definition 
of function privacy for subspace membership (Definition 12.31) extends to adver- 
saries that query the RoR FP oracle with secret keys for T correlated subspaces 
Wi, . . . ,Wt for any T = poly(A). If the columns of the jointly sampled sub- 
spaces [Wi | W 2 1 • • • | Wt] form a block source, we can extend the proof of func- 
tion privacy to consider such correlated queries. The adversaries view comprises 
T terms as in Equation © with randomly sampled vectrs si, . . . , St in place of 
s. The collection of functions g remains universal and a simple variant of the 
Leftover Hash Lemma implies that for our choice of parameters, the statistical 
distance between View rea i and the uniform distribution is negligible in A (and 
similarly for View rand ). 

Dealing with Small Attribute Spaces. We also consider constructing 
subspace-membership encryption schemes where we do not place any restric- 
tions on the size of the underlying attribute space S. In our generic construction, 

3 We note here that a weaker version of the Leftover Hash Lemma will suffice as the 

adversary’s view does not include (si, . . . , s m ). 
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observe that correctness requires that 1/|S| be negligible in A. If |§| is not super- 
polynomial in the security parameter, then correctness fails with a non-negligible 
probability. Additionally, this breaks the proof of attribute-hiding security in 
Theorem 13.11 In Equation ([]]), if the quantity 2Q/|§| is non-negligible, then a 
non-negligible advantage of an adversary A does not translate to a non-negligible 
advantage for the reduction algorithm B against the inner-product encryption 
scheme. 

To overcome this difficulty, we refine the construction as follows using a param- 
eter t = t(A) £ N. We split the message into t secret shares and apply parallel 
repetition of t copies of the underlying inner-product encryption scheme, where 
each copy uses independent public parameters and master secret keys. For the 
proof of security, it suffices to have r such that the quantity t/|S| t is negligible in 
A. Due to space constraints, a formal description of the scheme and a statement 
of its security is deferred to m Section 4.2]. 


4 Applications of Function-Private Subspace-Membership 
Encryption 


4.1 Roots of a Polynomial Equation 

We can construct a predicate encryption scheme for predicates corresponding to 
polynomial evaluation. Let d = {fp : p £ §[X], deg(p) < d}. where 


/pH = 


1 if p(x ) = 0 £ S 
0 otherwise 


Correctness and attribute hiding properties of the predicate encryption scheme 
for the class of predicates f ^ po j y are defined as in the case of a generic predicate 
encryption scheme in a natural manner (see [111 Definition 2.3]). 

Function-Private Polynomial Encryption. For the class , consider a 
real-or-random function privacy oracle RoR FP " 0 (along the lines of Definition 12. 1[) 
that takes as input triplets of the form (mode, msk, P), where mode £ {real, rand}, 
msk is a master secret key, and P = (Po, . . . , P ( i-i) £ § d is a circuit representing 
a joint distribution over coefficients of polynomials p with deg(p) < d. If mode = 
real then the oracle samples p <— P and if mode = rand then the oracle samples 
p <— S d uniformly. It then invokes the algorithm KeyGen(msk, •) on p and outputs 
secret key sk p . 

Along the lines of Definition 12. 21 we consider a fc-source function-privacy 
adversary A. Such an adversary is given inputs (1 A . pp) and oracle access to 
RoR fp ' <? and each query to the oracle is a fc-source (over the coefficients of the 
polynomial). 
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Definition 4.1 ( < £ p '(j y Function privacy). A predicate encryption scheme for 
the class of predicates denoted II = (Setup, KeyGen, Enc, Dec) is /c-source 

function-private if for any probabilistic polynomial-time k-source function- 
privacy adversary A, there exists a negligible function v{A) such that 

A( iv£r/(A) = |Pr^Exptpp'<p )n _ A (A) = lj - Pr^Exptp/^ ^(A) = lj < ^(A), 

where for each mode G {real, rand} and A G N the experiment Expt™ d J ,ii,aW 
defined as follows: 

1. (pp, msk) <— Setup(l A ). 

£ b <— ^RoR^(mode,msk,),KeyGen(msk,.)( 1 A ) pp) 

3. Output b. 

In addition, such a scheme is statistically k-source function private if the above 
holds for any computationally-unbounded k-source <P^ y function privacy ad- 
versary making a polynomial number of queries to the RoR F " oracle. 
Correlated RoR FP ' 4 ’ Queries. Definition 14.11 extends to adversaries that query 
the RoR FP " <J> oracle on T correlated queries. A scheme II is said to be (T, fc)-source 
(resp. (T, fc)-bfock-source) function private if each query (Pi, . . . , Pr) of a joint 
distribution over T polynomials is a (T, fc)-source (resp. (T. fc)-block-source) . 

Constructing Function-Private Predicate Encryption Schemes Sup- 
porting Polynomial Evaluation. Given a subspace membership encryption 
scheme (Setup, KeyGen, Enc, Dec) with parameters m = d and £ = 2d — 1, we can 
construct a predicate encryption scheme for ^ po j y as follows (for simplicity, we 
consider the instructive case d = 3 and subsequently explain how our technique 
generalizes): 

— Setup: The Setup algorithm remains unchanged. 

— Encryption: To encrypt a message M for the attribute x G §, the en- 
cryption algorithm sets x = (a; 4 , a: 3 , x 2 , x, l) T and outputs the ciphertext 
Enc(pp,x, M). 

— Key generation: To generate a secret key corresponding to the polynomial 
p = P 2 ■ x 2 + pi ■ x + po, the key-generation algorithm constructs a vector 
P = {P 2 ,Pi,Po) J € S 3 - Next, it “blinds” the polynomial p(x) with two linear 
polynomials r(a;) = rq • x + rp and s(x) = si ■ x + so and computes the 
coefficients of the polynomial p(x)-r(x)-s(x). The coefficients r i , ro , s i , so are 
sampled independently and uniformly at random from S. The key generation 
algorithm repeats this step with two more sets of polynomials (we refer 
to them as “randomizing” polynomials) r'(x),s'(x) and r"(x), s''(x) whose 
coefficients are also sampled uniformly at random. It constructs 
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— coefficients of p{x) ■ r(x) ■ s(x) — 

— coefficients of p(x) ■ r'(x) ■ s' { x) — 

— coefficients of p( x) ■ r"(x) ■ s"{x) — 


( 3 ) 


p 2 riS! 

P2r[s ' 1 
P2r ” Sl 


P2GS0 +P 2 TQSI 

P2roSo + Pins 0 

Piroso +Pon)Si 

Poroso 

+Pinsi 

+Pinjsi +ponsi 

+Ponso 

P2r[s ' 0 + p 2 r' 0 s ' 1 
-\-pir' 1 s ' 1 

P2r' 0 s' 0 +p\r' 1 s ' 0 
+Pir' 0 s ' 1 -\-por\s\ 

Pir' 0 s ' 0 + Por'^ 
+Por[s ' 0 

Por’os'o 

P2rfs 0 +P2rQSx 

P2r'oS 0 +pir"s 0 

Pir'oSo + Por'oS! 

Por'oSo 

+Pir 1 si 

+Pir'oS± -t-por'/si 

+Por'{ s 0 


The algorithm then runs KeyGen(msk, W) and outputs skw- 
— Decryption: The decryption algorithm remains unchanged. 


Correctness and Attribute Hiding. Given a ciphertext c for attribute x and 
a secret key for polynomial p , if p(x) = 0 then it follows that W • x = 0. If 
W • x = 0, then a; is a root of polynomials p • r ■ s, p ■ r' ■ s' , and p ■ r" ■ s" 
which implies that x is a root of p[x) with overwhelming probability over the 
choices of polynomials r, r',r",s, s', s" e S[X]0 The attribute hiding property of 
the scheme follows in a fairly straightforward manner from the attribute hiding 
property of the subspace membership encryption scheme. 

Function Privacy. We show that with overwhelming probability over the choices 
of the randomizing polynomials: (a) if the coefficients of p , namely (P 2 ,Pi,Po) 
are sampled from a k- source, then W is distributed according to a (5, fc)-block 
source, and (b) if the coefficients of p are sampled uniformly at random from 
§ 3 , then W is distributed uniformly over § 3x5 . Given the above two claims, a 
straightforward reduction allows us to simulate a RoR FP " 0 oracle given access to 
a RoR oracle for the subspace membership predicate with parameters m = 3 and 
£ = 5. Thus, we can state the following theorem. 

Theorem 4.2. IfSM. is a subspace membership encryption scheme with param- 
eters m = 3 and £ = 5 that satisfies function privacy against (5, k) -block-source 
adversaries, then the predicate encryption scheme for the class of predicates ^< 3 ly 
constructed above is statistically function private against k-source adversaries. 

Applying Theorem 13.11 for adversaries that query the RoR FP oracle with T cor- 
related queries immediately gives us the following corollary. 

Corollary 4.3. Given any large attribute space inner-product encryption scheme 
with £ = 3, there exists a predicate encryption scheme for the class of predicates 
^< 3 y statistically function-private against [T,k) -block- sources for any 

T = poly(A) and k > log |§| + w(log A). 

4 From a simple union bound over the events where three linear polynomials share a 
root, this probability works out to be > 1 — 8/|S| 2 which is indeed overwhelming. 
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Proof of Claims (a) and (b). Consider column Wi = (p 2 r\Si,p 2 r' 1 s' 1 ,p 2 r , {s") T . 
We observe that over choices of si, s^, and s" , the column wi is distributed uni- 
formly over S 3 . The second column W2 is also distributed uniformly at random 
by noting that the elements p2nso, P 2 f' 1 s' 0 , and are distributed uniformly 

in S 3 over choices of ri, r[ . and r” (which are themselves information theoreti- 
cally hidden in wi). An identical argument shows that over choices of ro, r' 0 , and 
r' 0 ', and so, s' 0 , and s'q . the fourth and fifth columns, W4 and W5, are distributed 
uniformly in S 3 . This is true even conditioned on all the other columns. It suf- 
fices to show that conditioned on wy, W2, W4, and w 5 , column w 3 has entropy 
at least log |S| +w(logA). 

We re-write w 3 as R • p where 


ros 0 riso + ^osi ns 1 
r’os’o + t'qSi ris'i 

/o s o r i s o+ r o s i r i s i. 


(4) 


With overwhelming probability over random choices of all the coefficients in 
the polynomials r, s, r' , s' , r", and s", the matrix R is full-rank over 8. Therefore, 
the distribution of w 3 has a one-one correspondence with the distribution of p. 
Therefore, w 3 has entropy at least k even given R if p is sampled from a fc-source 
and w 3 is uniform over 8 3 even given R if p is sampled uniformly from S 3 . This 
concludes the proof of claims (a) and (b). I 

A General Technique for #<“] y . As stated earlier, we can construct predicate 
encryption for the class of predicates starting with a subspace membership 
encryption scheme with parameters m = d and l = 2d — 1. The main idea 
in extending beyond d = 3 is to construct d randomized “blindings” of p(x). 
For i 6 [d], the i th row of W now comprises coefficients of a polynomial p(x) ■ 
r i,i{ x ) • • • ri t d-i(x) where each of the r t .j {x)'s are random linear polynomials 
sampled as r(x) and s(x) are sampled in the d = 3 construction. The details of 
our construction are as follows. Due to space constraints the details about the 
construction are deferred to the full version m Section 5.1]. 

Comparing Entropy Requirements. In Definition 14.11 and Corollary 14.31 it 
suffices to consider function-privacy adversaries that query the “real-or-random” 
oracle with polynomials whose coefficients come from a k- source. We do not re- 
quire the sources have conditional min-entropy in contrast to subspace member- 
ship function privacy (see Definition 12.31 and the discussion in Section [5]) . The 
reason this weaker restriction on f F I ( ; 0 j y function-privacy adversaries suffices when 
it does not suffice against subspace membership function-privacy adversaries is 
that the class of predicates offers a weaker functionality than is offered by 
subspace membership. In particular, if the adversary evaluates ciphertexts with 
attributes corresponding to “ill-formed” non- Vandermonde vectors, i.e., vectors 
not of the form (1, x, x 2 , . . .), correctness of decryption is not guaranteed and the 
particular attack outlined in Section [5] fails. It is easy to see this in our construc- 
tion as well — the randomizing polynomials ensure correctness only holds when 
the subspace membership predicate is evaluated on Vandermonde vectors. 
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4.2 Function-Private IBE with Minimal Unpredictability 

As discussed in Section ITU the IBE schemes of Boneh et al. [TU] are function pri- 
vate only for identity distributions with min-entropy at least A + w(log A). How- 
ever, the only inherent restriction required for a meaningful notion of security 
is that identity distributions have min-entropy w(logA). In this section, start- 
ing with predicate encryption schemes for polynomial evaluation constructed in 
Section l4~Tl we construct an IBE scheme satisfying function privacy with only a 
super-logarithmic min-entropy restriction on identity distributions. 

Scheme. Consider a predicate encryption scheme for the class of linear predi- 
cates <£<° 2 ly comprising algorithms (Setup, KeyGen, Enc, Dec). Prom Section 14711 
such a predicate encryption scheme can be built from any underlying subspace 
membership scheme for parameters m = 2 and i = 3. Given such a scheme, we 
construct an IBE scheme IB£ 0PJ for the space of identities § as follows. 

— Setup: On input 1 A , the IBE setup algorithm runs Setup(l A ) to receive 
(pp, msk) and publishes pp. 

— Key generation: On input msk and an identity id g §, the key generation 
algorithm constructs a (randomized) polynomial p\d (a;) such that py ( x ) = 0 
if and only if x = id as follows. The algorithm samples uniform r <— S and 
computes p,d(x) = r(x — id). It then runs the underlying KeyGen algorithm 
to output skid <— KeyGen(msk,pid). 

— Encryption: On input pp, an identity id, and a message M, the encryption 
algorithm computes Enc(pp, id, M). 

— Decryption: On input pp, a ciphertext c, and a secret key sk, the decryption 
algorithm simply computes the underlying decryption algorithm to output 
M <— Dec(pp, sk, c). 

Correctness of the IBE scheme follows from the correctness of the under- 
lying -predicate encryption scheme. Data privacy and anonymity of the 
IBE scheme (see im Definition 2.5]) follows directly from the attribute hiding 
property of the underlying #) 2 ly -predicate encryption scheme. In the theorem 
that follows, we prove that IB£ OPT is function-private against minimally unpre- 
dictable sources. 

Theorem 4.4. Given any large attribute space inner-product encryption scheme 
for dimension t = 3, there exists an IBE scheme function private against ( T , k)- 
block-sources for any T = poly (A) and k > w(logA). 

Proof Outline. For simplicity, consider adversaries that query the real-or-random 
oracle with fc-sources (i.e., T = 1). As outlined in Section PETl we first construct a 
predicate encryption scheme for that is function private against fc'-sources for 

k' > log |S| +w(log A). We instantiate 1B£ opt described above with this predicate 
encryption scheme. 

The proof proceeds by showing that RoR FP ' IBE queries (see m Definition 
2.6]) ID can be compiled to distributions over coefficients of linear polynomials 
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P = (Pi, Po) such that if 1100(77?) = k, then Hoo(P) = k + log |S|. This allows 
us to simulate a RoR FP " IBE oracle given an oracle RoR FP_!? for linear polynomials 
thus showing that IB£ OPT is function-private against fc-sources if the encryption 
scheme for «?<° 2 ly is function-private against fc'-sources. Due to space constraints, 
the reader is referred to the full version for details m Section 5.1]. 

Fully-Secure Function-Private IBE. Current constructions of inner-product 
encryption schemes [2313] satisfy a selective notion of security where the chal- 
lenge attributes are chosen by the adversary before seeing the public parameters. 
Our transformation of inner-product encryption schemes to function-private IBE 
schemes with minimal unpredictability is not limited to selective security. Start- 
ing from an inner-product encryption scheme satisfying an adaptive version of 
attribute hiding, we can construct fully-secure IBE schemes. We also note that 
the standard complexity leveraging approach (see [7] Section 7.1]) gives a generic 
transformation from selectively-secure IBE to fully-secure IBE. This approach 
does not modify the key generation algorithm and therefore preserves function 
privacy. 

5 Conclusions and Open Problems 

Our work proposes subspace-membership encryption and constructs the first 
such function-private schemes from any inner-product encryption scheme. We 
also show its application to constructing function-private polynomial encryption 
schemes and function-private IBE schemes with minimal unpredictability. In this 
section, we discuss a few extensions and open problems that arise from this work. 
Function Privacy from Computational Assumptions. In this work we 
construct subspace-membership schemes that are statistically function private. 
Although the construction of inner-product encryption schemes from lattices [3] 
presents an immediate function-privacy attack, we were unable to find such at- 
tacks for the construction from composite-order groups [23] (or its prime order 
variant BSD- We conjecture that suitable “min-entropy” variants of the deci- 
sional Diffie-Hellman assumption ns have a potential for yielding a proof of 
computational function privacy for these schemes. 

Other Predicates. A pre-cursor to the work on predicate encryption support- 
ing inner-products was work on predicate encryption supporting comparison 
and range queries by Boneh and Waters m ■ They achieve this by constructing 
predicate encryption supporting an interesting primitive, denoted Hidden- Vector 
Encryption (HVE). Briefly, in HVE, attributes correspond to vectors over an al- 
phabet E and secret keys correspond to vectors over the augmented alphabet 
E U {*}. Decryption works if the attributes and secret key match for every co- 
ordinate that is not a *. 

HVE can be implemented using inner-product encryption schemes [23] but it 
breaks function privacy in a rather trivial manner. Formalizing function privacy 
for HVE does not immediately follow from the notion of function privacy for 
inner-products because of the role played by *. The questions of formalizing 
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function privacy (which in turn will imply realistic notions also for encryption 
supporting range and comparison queries) and designing function-private HVE 
schemes are left as open problems. It is also open to formalize security and 
design function-private encryption schemes that support multivariate polynomial 
evaluation. 

Enhanced Function Privacy. A stronger notion of function privacy, denoted 
enhanced function privacy m, asks that an adversary learn nothing more than 
the minimum necessary from a secret key even given corresponding cipher- 
texts with attributes that allow successful decryption. Constructing enhanced 
function-private schemes for subspace membership and inner products is an in- 
teresting line of research that may require new ideas and techniques. 
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Abstract. This paper initiates the study of preserving differential pri- 
vacy (DP) when the data-set is sparse. We study the problem of con- 
structing efficient sanitizer that preserves DP and guarantees high utility 
for answering cut-queries on graphs. The main motivation for studying 
sparse graphs arises from the empirical evidences that social network- 
ing sites are sparse graphs. We also motivate and advocate the necessity 
to include the efficiency of sanitizers, in addition to the utility guaran- 
tee, if one wishes to have a practical deployment of privacy preserving 
sanitizers. 

We show that the technique of Blocki et al. 0 (BBDS) can be adapted 
to preserve DP for answering cut-queries on sparse graphs, with an 
asymptotically efficient sanitizer than BBDS. We use this as the base 
technique to construct an efficient sanitizer for arbitrary graphs. In par- 
ticular, we use a preconditioning step that preserves the spectral prop- 
erties (and therefore, size of any cut is preserved), and then apply our 
basic sanitizer. We first prove that our sanitizer preserves DP for graphs 
with high conductance. We then carefully compose our basic technique 
with the modified sanitizer to prove the result for arbitrary graphs. In 
certain sense, our approach is complementary to the Randomized saniti- 
zation for answering cut queries [ 13 ]: we use graph sparsification, while 
Randomized sanitization uses graph densification. 

Our sanitizers almost achieves the best of both the worlds with the 
same privacy guarantee, i.e., it is almost as efficient as the most effi- 
cient sanitizer and it has utility guarantee almost as strong as the utility 
guarantee of the best sanitization algorithm. 

We also make some progress in answering few open problems by BBDS. 
We make a combinatorial observation that allows us to argue that the 
sanitized graph can also answer (S, T)-cut queries with same asymptotic 
efficiency, utility, and DP guarantee as our sanitization algorithm for 
S, 5-cuts. Moreover, we achieve a better utility guarantee than Gupta, 
Roth, and Ullman [I7( . We give further optimization by showing that fast 
Johnson-Lindenstrauss transform of Ailon and Chazelle 0 also preserves 
DP. 

Keywords: Differential privacy, Graph sparsification, (S', T )- cut queries, 
Fast Johnson-Lindenstrauss transform. 
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1 Introduction 

The privacy of a data is a fundamental problem in today’s age of information. 
Many agencies collect enormous amount of data and store it in its database. 
These data may contain sensitive informations about an individual. However, 
given the benefits of analyzing these data, the problem that curators of such 
a database face is to provide useful information in such a manner so that no 
personal or sensitive information about an individual is leaked. A trivial way to 
guarantee this is to add a lot of noise to the database; however, nothing useful 
could be harnessed from such noisy database. Most of the research in this area is 
geared towards providing a tight utility and privacy tradeoff and only consider 
the query generator in mind. In this paper, we take a conceptual review and ask 
the practical question: what would a firm, that is going to deploy these sanitizers, 
demand from the group that develops these algorithms? 

The question one expect to get from real firms or agencies is what extra 
resources they have to invest to provide this facility. This is expected in the real- 
world because a curator would prefer to deploy its resources to facilitate other 
interfaces that are primary to its business if differential private sanitizer uses a 
lot of resource. In general, sanitizers are polynomial time, but the exact bound 
on this polynomial is never made explicit in earlier works. In fact, Exponential 
sanitization 0,0 may be intractable! We initiate the study of the question 
whether it is possible to guarantee DP that has high utility guarantee with an 
efficient sanitizer, emphasizing on a concrete bound on the efficiency parameter. 

Motivation of Our Problem. Our motivation of studying cut queries on sparse 
graphs arises from a natural problem in social networks. One of the question 
that is commonly asked in social network is, given a set of individuals, how 
many friends/acquaintance do a set of people have outside their circle? The 
natural approach to solve this problem is to construct a friendship graph, where 
each vertex is labeled by an individual and there is an edge between two vertices 
if they are friends. These graphs on social networks are usually sparse, i.e., the 
average degree of the graph is very small in comparison to the number of vertices. 

For a concrete example, consider the friendship graph on Facebook. According 
to the recent data released by Facebook, it has around one billion active users! 
It is not outrageous to assume that only a small fraction of users on Facebook 
have more than a thousand friends. Therefore, this graph is highly sparse. The 
friendship graph is undirected; however, this might not always be the case. For 
example, consider the following graph based on the networking of Twitter. It is 
a directed graph with nodes labeled by an individual. A node is the tail of an 
edge if the individual follows the head of the edge. The number of active users on 
Twitter is few million; however, it is less likely that an individual follows more 
than a few hundred fellow users. Thus, the following graph is very sparse. In 
these scenarios, the difference between performing lo 9x2 - 38 and 10 18 algebraic 
operations is huge. Any firm, like Facebook and Twitter, which is motivated by 
economics is less likely to invest in the former sanitization algorithm and may 
consider investing in the latter one. 
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It could be argued that if a sanitizer works for dense graphs (and therefore, 
also for sparse graphs), then there is no need for a specialized sanitizer for sparse 
graphs. The reason why we feel it is important to study sparse graphs exclusively 
is that sanitizers for dense graphs do not use the structural properties present in 
sparse graphs. In general, sparse graphs provides faster algorithms (alia® - For 
example, consider the Johnson-Lindenstrauss (JL) sanitizer Q. The sanitization 
algorithm of BBDS first overlays a complete graph on top of the input graph and 
then applies JL transform to the columns of the Laplacian of the modified graph. 
The step in which we overlay the complete graph destroys all the structural 
properties the input graph might have. 

Now consider the situation when the input graph is sparse, and a hypothetical 
sanitizer that does not overlay K n on top of it. When we perform random pro- 
jection on this graph, the number of operations would depend on the number of 
edges of the graph (more concretely, on the non-zero entries in the representative 
matrix of the graph which could be Laplacian or adjacency matrix). Unfortu- 
nately, this sanitizer is not differentially private if the graph is weakly connected 
(in graph theoretic terms, has low conductance). 

To see why this hypothetical sanitization does not provide DP, consider an 
n-vertices graph with two connected components. If the query is to find the cut 
of all the set of vertices in one component, the answer is 0 with probability 1. 
However, for a graph that has an edge joining two vertices present in different 
connected components, the probability with which the response to the query is 
non-zero is 1. This gives an easy way to differentiate the two cases. To resolve 
this particular problem, BBDS overlaid an n - vertex complete graph on top of 
the input graph. 

Unfortunately, if we overlay the complete graph on a sparse graph, then we 
destroy the sparsity, and lose any (possible) gain in the computational time. On 
the other hand, even if the graph is connected and we do not perturb the graph, 
chances of privacy leakage are still present. More specifically, adding a single edge 
in a sparse graph can potentially have more privacy leak than a corresponding 
change in dense graphs. For example, consider a line graph or tree. They are 
acyclic; however adding any edge introduces a cycle. A slight modification of the 
differentiating algorithm used in the case of two component graph could be used 
to break the DP. 

Our Contributions. This work is motivated by practical scenarios in which a 
sanitizer might be deployed. One of the objective of this paper is to advocate 
that, in addition to the utility and privacy guarantee, a design methodology 
for sanitizers should also give a concrete analysis of the efficiency of sanitizers. 
We initiate this line of work by studying differentially private sanitizer for cut 
queries on graphs. 

As mentioned above, in practice, sparse graphs are more likely to occur than 
dense graphs. Every sanitizer that are proposed in the literature for dense graphs 
also works for sparse graphs, but they are not efficient. Moreover, there are 
examples of sparse graphs that could leak more information in DP sense than 
dense graphs, mainly because an addition or deletion of an edge could change 


Random Projections, Graph Sparsification, and Differential Privacy 279 


the graph properties more dramatically in sparse graphs than in dense graphs. 
Thus, the problem is non-trivial, especially when we wish to construct efficient 
sanitizer. 

On the fundamental level, we advocate the need of considering the efficiency 
of the sanitizer in the design methodology and give an explicit bound on the 
running time. The reason why we believe this is an important parameter is that, 
in many practical scenarios, the data-sets are held by agencies who might not 
have the privacy of an individual as their biggest priority. Therefore, unless a 
sanitizer is efficient, they might not have any incentive to perform the required 
sanitization. The technical contributions of this paper are as follows. 

1. We show that it is possible to adapt the JL sanitizer of BBDS to answer 
cut queries when the graph is sparse. Additionally, our sanitizer has to per- 
form only 0(n 2+o ( 1 )) algebraic operations. On the other hand, irrespective of 
whether or not the graph is sparse, the sanitizer of BBDS needs 0(rn 238 ) op- 
erational, where r is the dimension of the subspace to which the JL transform 
projects the columns of the Laplacian. This improvement is asymptotically 
significant. 

2. A natural question that arises next is whether our basic sanitizer is use- 
ful when the graph is fairly dense. We answer this question in affirmative. 
More precisely, we show that if we precondition a graph by reassigning the 
weights to the edges such that the transformed graph is guaranteed to be 
sparse and maintain the spectral properties of the graph, then applying the 
basic sanitizer on the conditioned graph preserves DP. This can be seen as 
a complementary approach to the Randomized sanitizer [l^] • 

3. We make a simple combinatorial observation to argue that our sanitizer also 
preserves (5, T)-cut queries. This answers an open problem raised by BBDS. 

4. Our last contribution is directed towards the optimization of the algebraic 
computations. We show that DP is maintained even if we replace the stan- 
dard JL transform by the fast JL transform of Ailon and Chazelle [§]. This 
partially answers another open problem of BBDS. 

Remark 1. An important characteristics of our preconditioning step, in item 2 
above, is that it preserves the spectral properties. Any sanitizer that answers the 
queries based on spectral property of a graph could be transformed to first apply 
the preconditioning step before the sanitization step to improve the efficiency. 

We note that none of our sanitizer randomly projects the vector corresponding to 
the column vector of the graph to a smaller dimension r. The main observation 
is that the mechanism is non-interactive, and in order to preserve the privacy 
for all set of queries, the dimension of the projected space has to be at least the 
dimension of the input space. 

Overview of our Techniques. We first give a brief overview of the sanitizer of 
BBDS. In BBDS, the sanitization algorithm first reweighs the graph by overlaying 

1 Assuming that the matrix multiplication is done using Coppersmith-Winograd’s 
algorithm. 
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a complete graph on top of the input graph, i.e., every edge, e = (a, b), with 
weight w e is replaced by an edge with weight w' e := ^w e + (l — . In other 

words, the weight on the edges are redistributed such that the overlaid complete 
graph has an equal weight, on all its edges. This makes the graph connected, 
and from Lemma 01 the smallest non-zero eigenvalues of this modified graph 
is greater than w/n. The JL transform is then applied on the columns of the 
Laplacian of the modified graph. 

The major challenge that we face is that sparse graphs have low conduc- 
tance. When we overlay the complete graph, it increases the conductance, but 
simultaneously makes it inefficient to answer the cut-queries by destroying the 
structural properties. As an extreme, consider a line graph. The cut-queries are 
fairly straightforward to answer; however, if we overlay a complete graph on top 
of it, we destroy the structural property and need to do some extra arithmetic to 
answer the same set of queries. An alternative is to overlay a sparse graph that 
increases the conductance, but does not destroy the structure of the underlying 
graph by much. The most natural candidate for this is an expander graph. As 
we show in our analysis, this suits our purpose very well. 

We modify our basic sanitization technique, as outlined above, to construct 
an efficient sanitization technique for dense graphs. The key idea is to use graph 
sparsification at an appropriate step. As a warm-up, we assume that the input 
graph has high conductance. Our technique in this case is simple: apply the graph 
sparsification algorithms followed by the JL transform. The key observation here 
is that conductance helps in proving that, when the sparsification technique is 
applied on two neighboring graphs, the corresponding sparse graphs differ on at 
most one edge. This allows us to use the proof of BBDS for DP. On the other 
hand, due to the spectral guarantee provided by the sparsification technique, we 
know that all the cuts of the graph is maintained within a multiplicative factor. 
The utility guarantee then follows using simple arithmetic. 

In order to apply the above analysis to arbitrary graph, we need a high con- 
ductance graph. This directs the order of the steps we follow for arbitrary graph, 
i.e, we first overlay a complete graph (or an expander) on the input graph before 
applying the sparsification algorithm. Finally, we apply the JL transform. 

Related Work. Differential privacy, introduced by Dwork et al. [fj , provides a ro- 
bust guarantee of privacy. Informally speaking, if a curator sanitizes a data-set, 
then even if an individual’s data is removed from the database, none of the responses 
to a query is more or less likely than the others. The key idea used in Dwork et al. [§] 
is to add noise to an output of the query according to a Laplace distribution, where 
the distribution is parameterized by the sensitivity of the query function. The Gaus- 
sian variant of this basic sanitizer was proven to preserve DP by Dwork et al. [§] in a 
follow-up work. Since then, many sanitizers for preserving DP have been proposed 
in the literature, including the Exponential sanitizer [3, [23,123], the Multi plic ative 
Update sanitizer [l6l - [l9| , the Median sanitizer [b 3] , the Boosting sanitizer [Ic| , and 
the Random Projection sanitizer (25j. All these sanitizers have a common theme: 
they perturb the output before responding to queries. Blocki et al. [3 took a comple- 
mentary approach. They perturb the input by performing a random projection of 
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the input and show that existing algorithms preserves DP if the input is perturbed 
in a reversible manner. 

The first work to explicitly study DP when the underlying data-set is a graph 
or a social network was by Hay et al. [2(| . They presented a differentially private 
sanitizer for answering the degree of a node in a graph. They were followed by 
works of Nissim et al. 0 and Karwa et al. 0- Gupta et al. 0 first studied 
the question of answering (S,T)- cut queries. The literature of studying faster 
computations on a sparse variant of any mathematical objects is so extensive 
that we cannot hope to cover it in all details here. An extensive study of faster 
methods of doing linear algebra on a sparse matrix is covered in standard text- 
books 171 llal . We refer the readers to an excellent book by Nesetril and de 
Mendez [28| for the properties and algorithms on sparse graph. 

Organization of the Paper. In Section [21 we cover the basic preliminaries and 
definitions to the level required to understand the presentation of this paper. In 
Section [3l we give our basic sanitizer for sparse graph that serves as the building 
blocks for the sanitizers in Section [21 We conclude the paper by showing in 
Section [S] that the fast JL transform of Ailon and Chazelle [2[ also preserves DP. 

2 Preliminaries, Notations, and Basic Definitions 

2.1 Privacy and Utility 

In this work, we deal with privacy-preserving sanitizers for answering cut queries 
on graphs. The notion of differential privacy requires a definition of neighboring 
data-sets. Two data-sets (graphs, respectively) are neighboring if they differ on 
at most one entry (edge, respectively). 

Definition 1. A randomized algorithm 1C, also called a sanitizer, gives e-DP, if 
for all neighboring data-sets Di and D ? , andallrangeS C Range{K), Pr[K,(D \ ) £ 
S] < exp(e e )Pr[/C(Z? 2 ) £ 5], where the probability is over the coin tosses of the 
sanitizer 1C. 

In this paper, we study a natural relaxation of DP, called approximate DP. 

Definition 2. A randomized algorithm, 1C also called a sanitizer, gives ( s,6 )- 
DP, if for all neighboring data-sets D i and D^, and all range S C Range(lC), 
Pr[/C(Di) £ 5] < exp(e e )Pr[/C(D 2 ) £ S'] + 6, where the probability is over the 
coin tosses of the sanitizer 1C. 

2.2 Linear Algebra 

Let A be an n x m matrix. We let rk(A) denote the rank and Tr(A) denote the 
trace norm of the matrix A. The singular value decomposition of A is A = V AU T , 
where U, V are unitary matrices and A is a diagonal matrix. The entries of A, 
denoted by Ai (A). • • • , A^m) (A), are called the singular values of A. Since U, V 
are unitary matrices, one can write A* = V A l U T for any real value i. If A is 
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not a full rank matrix, then its inverse is called Moore-Penrose inverse and is 
denoted by Af and its determinant is called pseudo- determinant and is defined 
as A{A) = Xj(A). We let \s € {0, 1}" denote the characteristic vector for 

a subset S C V. 

2.3 Gaussian Distribution 

Given a random variable, X , we denote by X ~ M{ji, a 2 ) the fact that X is 
distributed according to a Gaussian distribution with the probability density 
function, PDFx(a;) = ex P ^ j • The Gaussian distribution is invari- 
ant under affine transformation, i.e., if X ~ N{p x , a x ) and Y ~ N(p, y , a y ), then 
Z = aX + bY has the distribution Z ~ N(ap x + bp y , aa% + ba %). 

Multivariate Gaussian Distribution. The multivariate Gaussian distribution 
is a generalization of univariate Gaussian distribution. Given an rn-dimensional 
multivariate random variable, X ~ Af(p, X) with mean fj, E K m and covariance 
matrix E = E[(J 5 f — p.)(X — fd) T ], the PDF of a multivariate Gaussian is given by 
PDFx(x) := ^ ex P (~ 3 xT ^ x ) • T' is easy to see from the description of 
the PDF that, in order to define the PDF corresponding to a multivariate Gaussian 
distribution, E has to have full rank. If E has a non-trivial kernel space, then the 
PDF is undefined. However, in this paper, we only need to compare the probabil- 
ity distribution of two random variables which are defined over the same subspace. 
Therefore, in those scenarios, we would restrict our attention to the subspace or- 
thogonal to the kernel space of E. 

Multivariate Gaussian distribution maintains many key properties of univari- 
ate Gaussian distribution. For example, any (non-empty) subset of multivariate 
normals is multivariate normal. Another key property that is important in our 
analysis is that linearly independent linear functions of multivariate normal ran- 
dom variables are multivariate normal random variables, i.e., if Y = AX + b, 
where A is an n x n non-singular matrix and b is a (column) n-vector of con- 
stants, then Y ~ M{A/i + b, AEA r ). 

2.4 Graph Theory 

We reserve the symbol Q and H to denote a graph. We denote by Q' the graph 
formed by adding an edge to the graph Q. In the case when H is formed from 
Q using some transformation, we denote by %' the graph formed by performing 
the same transformation on Q' . For any S C V(Q), the cut of the set of vertices 
S, denoted it by &g{S), is the weight of the edges that are present between S 
and V\S. 

We follow the same terminology of BBDS to define the utility guarantee. 
Definition 3. We say a sanitizer K, gives a (r/, r, ^(-approximation for cut 
queries, if for every non-empty set S C V, it holds that 

Pr[(l - ti)$q(S) - t < K(S,G) < (1 + V )$g{S) + T\ >l-i a 
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For the entire paper, we fix w = O ^ 1 °e( 1 / <; )+ lo g( 1 / 1 ') ^ _ 

Laplacian of a Graph. For a weighted graph Q := (V, £, w), its adjacency matrix 
Ag is given by Ag(i,j) = Wij if (i,j) G £. The degree matrix of a weighted 
graph Q is given by a diagonal matrix Dg such that the diagonal entries (i, i) is 
Ag{i,j). The signed-edge matrix, Bg, is constructed in the similar fashion 
as in BBDS: let O be an arbitrary orientation of edges. For an edge e = (u,v), 
place y/wf at position (e, v) if the edge e has v as its head and —y/wf if it has v 
as its tail. For the other (e, i ) when i ^ u, v, place 0. 

The matrix for the Laplacian of a weighted graph, denoted by Lg, is defined 
as Dg — Ag . One of the most useful form of Laplacian of a graph is the following 
form: Lg = J2( a ,b)eE w abL a b = BgBg, where L a f, is the Laplacian of a graph 
with a single edge (a, h). Many interesting properties of the Laplacian of a graph 
follows from this representation. For example, Laplacian of a graph is positive 
semi-definite, i.e., all the eigenvalues are non-negative. For a set S of vertices, its 
cut-set is d>g(S) = XgLgxs • Moreover, for S, T C V, the sum of the weights of 
the edges with one end in S and other in T is denoted by d>g(S,T). We explore 
this in detail later in Section 14.31 

We let A i(Q) denote the eigenvalues of Lg for 1 < i < n. Next we present few 
lemmata that are useful in our analysis. In our analysis, we analyze multivari- 
ate Gaussian distributions that are linear combination of the Laplacian of two 
graphs. In order to analyze the two distributions, the corresponding covariance 
matrices must span the same subspace. The first lemma allows us to work on 
the same subspace, that is, the subspace orthogonal to Span{l}. 

Lemma 1. [ll], [l2| Let 0 = Ai(£?) < A 2 (&)■■■< A n (Q) be the n eigenvalues of 
Lg . Then Q is connected iff A 2 > 0 and the kernel space of a connected graph is 
Span{l}. More generally, if a graph has k components, then the multiplicity of 
eigenvalue 0 is k. 

The following two lemmata are useful in giving the upper bound while proving 
the DP of our sanitizer. 

Lemma 2. Let Q and Q' be two graphs, where Q' is obtained from Q by adding 
one edge joining two distinct vertices ofQ. Then 

Lemma 3. Let G' be formed by adding an edge (u, v) to G ■ For any vector 
x G M", we have Tr (Lg>) < Tr {Lg) + 2. 

The following lemma is particularly useful in arguing that the lowest non-zero 
eigenvalues of all the graphs is bounded from below by a constant (which is the 
second smallest eigenvalue of an expander). 

Lemma 4. (Eigenvalue Interlacing). Let G and G' be two graphs, where G' is 
obtained from G by adding one edge joining two distinct vertices of G- Then 

Xi(G) < MG') < \ i+ i(G). 
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In particular, ifH be a subgraph of Q, then Xi(H) < Aj(£?)Vl <i<n. 

We refer the readers to the excellent book by Godsil and Royle [l4[ for a com- 
prehensive treatment of the algebraic properties of graphs. 

Graph Approximation. A graph H is said to e-approximate a graph Q if H 
approximates the spectral properties of Q, i.e., 

(1 — e)x T Lgx < x T i w x < (1 + e)x T Lgx Vx e t n . 

We denote it by (1 — e)Lg A L H A (1 + e)Lg. 


Electrical Flows and Resistance. We need the concept of electrical flow in graphs 
at various points for the analysis of Theorem [7] Intuitively, electrical flow of a 
graph measures how easy or difficult it is to move from one vertex to the other. 
If the “resistance” (as described later) between two vertices is high, then it is 
more difficult to reach from one vertex to the other, and vice versa. We give a 
brief exposition of the electrical flow that is required to understand this paper. 
Let i be the vector of current injected at the vertices of the graph Q. Then 
the effective resistance between two vertices u and v is defined as the potential 
difference induced between them when a unit current is injected at one vertex 
and extracted from the other. For any pair of vertices u and v, the effective 
resistance, 

Ruv = (Xu ~ XvftfgiXu ~ Xv) = || Bgtfgixu ~ Xv) 2 \\l 


Conductance. At the intuitive level, the conductance of a graph is the inverse of 
the resistance. For a graph, Q = ( V , E), let d v denote the degree of vertex v G V. 
Let Vol(S) = then the conductance of a set of vertex S, denoted by 

conds(G) is defined by 


conds(G) := 


ms)\ 

min {Vol{S),Vol(Y - S)} 


The conductance of a graph Q is then given by cond(G) := mins c y l |.s[> 1 conds(G). 
The conductance of a graph has a strong relation to the smallest non-zero eigen- 
value of its Laplacian and we use it implicitly or explicitly in all of our analyses. 


Theorem 1 . (Cheeger’s Inequality). For a graph Q , cond(G) 2 /2 < X-iiLg) < 
2cond(G). 


2.5 JL Transform 

The famous JL transform [I], 0, 0, 0, [ 2 TJ, [22| can be seen as a random projection 
of d points from a n-dimensional space to a lower dimensional space such that 
the Euclidean distance between any two pairs of points is maintained. 
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Theorem 2. Fix any rj e (0, 1/2) and M be a k x n matrix, whose entries are 
chosen from Af( 0, 1). Then \/x £ R", we have 

Pr M |(l-r 7 ) || a:|| 2 < ^||Ma:|| 2 < (1+77)|| a;|| 2 | > 1 - 2exp(-77 2 /c/8). (1) 

Fast JL Transform. Ailon and Chazelle [|J gave an elegant transform that is 
asymptotically faster than the traditional JL transform. It involves precondi- 
tioning the input. In this section, m denotes the number of n-dimensional points 
on which the transform is applied and k denotes the target dimension. More 
specifically, the fast JL transform is M = PWD, where (i) W is a n X n 
normalized Walsh-Hadamard matrix, (ii) D is a n X n diagonal matrix, where 
P r[Dij = 1] = Pr[Dij = —1] = 1/2, and (iii) P is a k x n matrix whose elements 
are independently distributed as follows. With probability 1 — q, set = 0; 
otherwise draw Pij from a normal distribution of expectation 0 and variance 1 / q. 
The constant q is called the sparsity constant and is set to q = <9 (^ — — ^ rl j , 
where p is the norm we wish to preserve. Since W encodes the discrete Fourier 
transform, using fast Fourier transform, Ailon and Chazelle [ 2 J proved that the 
transform satisfies equation o, and takes time 0(n + qm/rf 2 )). 


3 Sanitizer for Cut Queries 

In this section, we give our basic sanitizer for sparse graphs. Our key observation 
is that, in the sanitizer of BBDS, the overlay of the complete graph is required to 
maintain high conductance, and the result regarding the second smallest eigen- 
value of the Laplacian follows immediately from the fact that a complete graph 
is a subgraph of the resulting graph. Unfortunately, this perturbation, when 
applied to sparse graphs, destroys the structural benefits of sparsity. 

We get the same two objectives by overlaying an expander graph. An expander 
graph makes the graph connected while the smallest non-zero eigenvalue has 
the desired lower bound if we chose our expander graph with care. Recently, 
Friedman 0 proved that a random graph is an expander graph with high 
probability. In fact, he showed that such graphs are Ramanujan graphs. Marcus, 
Spielman, and Srivastava 0 recently proved the existential result for bipartite 
expander that matches the Ramanujan bound for every degree d. 

We first derive a connection between the spectral properties of an expander 
graph and a complete graph. The most useful relation for this derivation is an 
alternate definition of an expander graph, i.e., a d-regular graph Q is an expander 
if ^ 2 {G) > (1 — e')d for some arbitrary constant e'. We give our basic sanitizer 
for sparse graphs in Figure [T] 

Theorem 3. The basic algorithm in Figure \J\ preserves (e, <5)-DP, provides an 
utility of (rj,T,v)- approximation, where t < 0((r] + e')ws), and runs in time 
0{n 2 +°W). 
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Input. A n- vertex sparse graph Q, and parameters e,5,r). 

Output A Laplacian of a graph L. 

1. Pick a d-regular expander graph E. 

2. Set L-h <— ^ Lb + (l — Lg 

3. Pick annxti matrix M with each entries picked from Af( 0, 1). 

4. Output L = (M T L'nM)/n 

The sanitizer publishes L and anyone with a set of vertices S as input can 
compute d>g ( S ) as below 



where |S| = s. 


Fig. 1 . The Basic Sanitizer 


Proof. We first perform the complexity analysis of the above sanitizer. For a 
sparse graph, m = 0(ri); therefore, using [34j], it takes 0(n) time to compute 
the JL transform (since every column in the Laplacian of a sparse graph has 
0(1) entries). Since, matrix multiplication takes f2(n 2 ), this is almost tight for 
any sanitizer design that uses noise multiplication for answering cut queries. 

The proof of DP proceeds in the similar manner as in BBDS. This is because 
our change still fulfills the requirement for which BBDS introduced the complete 
graph, i.e., it makes the graph connected. Using Lemma [TJ the kernel space 
is Span{l}. Also, by a suitable choice of the expander graph, i.e., one with 
1 — e' > w/d, Lemma HI guarantees that all the eigenvalues of H is greater that 
w, which is required in the privacy analysis of BBDS. 

Our proof of utility guarantee develops on a useful relation between an ex- 
pander graph and a complete graph. Let E be a d-regular expander graph such 
that the eigenvalues of A E are < e'd. From the expression, L E = D E — A Fj , 
where D E is the degree matrix of E, we have that all the non-zero eigenvalues of 
L e are between (1 — e')d and (1 +e')d. Therefore, from Courant-Fischer formula, 


(1 — e')dx T x < -k t L e x < (1 + e')dx T x Vx e K n . 


(2) 


We wish to relate this to the complete graph. For the complete graph, K n , the 
eigenvalues are 0 with multiplicity 1 and n with multiplicity n— 1. Therefore, 



( 3 ) 


Plugging equation (J3]) in equation © , we have 


(1 — e')— x T Ljy n x < x t Lex < (1 + e')— x T L^ n x Vx e K n . (4) 
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One way to look at equation 0] is that an expander graph is a sparsified 
complete graph. Using equation 01 we have the following approximation: 

(d - <0 % l k . + (i-=) r«) s (f l e +(i-f ) Lg ) 

^(( 1 + ^ iKi> + ( 1 _|) is ). ( 5 ) 
We can now calculate the utility guarantee using equation © and similar 
arithmetic as in BBDS. More specifically, the upper bound on the utility guar- 
antee can be calculated as below. 

(1 + v)Xs L nXs = (1 + v)Xs {^ l e + (l - £p) Xs 

< (i + v)xs (C 1 + e ')~ LKn + (i - L e) xs 

< (1 + rj){ 1 + e')^ s ( n ~ s ^ + { 1 ~^)^ 1 + ^s(S)- 

Therefore, 

_2_ ( x p xs _ (i - i) + (i + ,)*„(S) 

< 2(?7 + e' + J7e')u;s + (1 + r])<l>g(S). 

This gives an additive approximation of 0((rj + e , )ws). The proof of the lower 
bound is similar. 

4 Differential Privacy by Sparsification 

In Section [3J we showed that a simple change to the sanitizer of BBDS gives 
an efficient sanitizer for sparse graphs. In this section, we consider the case 
of an arbitrary graph. In particular, we show that various graph sparsification 
techniques also preserves DP. This serves as the second main contribution of this 
paper. Intuitively, the result in this section follows from the observation that, for 
large enough n, the sparsification techniques can be seen as a random projection. 
Thus, sparsification composed with our basic scheme should preserve DP by the 
composition theorem [10| and Theorem 01 

In certain sense, our approach is complementary to the approach used in Ran- 
domized sanitization. The Randomized sanitization [l3| constructs a weighted 
graph, H = (V, £',w'), such that Vtt, y G V, the weight of edge (u. v) in H is 
distributed as per the following distribution: Pr[u4„ = 1] = (1 + ew uv )/ 2 and 
P r[w' uv = -1] = (1 - sw uv )/ 2. 


4.1 Sanitization of Graphs with High Conductance 

Every randomized sparsification technique picks an edge to be included in a 
sparsified graph with some specified probability distribution. At a high level, 
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Input. A n - vertex graph Q with high conductance, and parameters e,5,r/. 
Output A Laplacian of a graph L. 

1. Convert Q to a ~H using EH or & such that H is an e-sparsification of Q. 

2. Pick an n X n matrix M with each entries picked from A/"(0, 1). 

3. Output L = (M t L-hM) / n. 

The sanitizer publishes L and anyone who has as input the set of vertices S can 
compute $g(S) = (l - f ) Xs^XS- 


Fig. 2. Sanitizer for Graph With High Conductance 


the distribution can be defined either dependent on the local structure or on the 
global structure of the graph. We give our sanitizer for both types of distribution, 
picking the most efficient one for instantiation. In this section, we analyze our 
sanitizer, stated in Figure [H for graphs with high conductance. 

The utility guarantee follows from the sparsification guarantee provided by the 
respective sparsifiers and the JL transform. The efficiency guarantee is straight- 
forward from the observation that there are 0(1) entries in every row or column 
of the Laplacian of sparse graphs, and the ran time of the step 3 in the Figure [5] 
is governed by the number of non-zero entries in the Laplacian. More concretely, 
assume that the sparsification algorithm takes 0(m) time to output a grap h with 
0(n) edges (as we will see, both the techniques, Spielman and Teng [32] based 
on local properties of the graph (Theorem [4]), and Spielman and Srivastava [3l[ 
based on global properties of the graph (Theorem [5]) , satisfies these two condi- 
tions). Therefore, even if the graph is dense, i.e., m = 0{n 2 ), the run time for 
sparsification is 0(n 2 ). Therefore, the time taken by the sanitizer is bounded by 
0(n 2 ) (since, in expectation, every column in the Laplacian of the sparse matrix 
has 0(1) entries). 

The tricky part is to prove the privacy guarantee. For the privacy guarantee, 
we prove that for two neighboring graphs Q and Q' , the respective sparse graphs 
differ on at most one edge. We can then apply Lemma O and the rest of the 
proof follows along the same line as BBDS. For both the sparsifier, we prove that 
if the graph has high enough conductance, then the probability distribution on 
edges with which the sparsification algorithm picks an edge does not differ by 
a lot. We then analyze two types of edges: (i) the edge (a, b) that is present in 
Q ' but not in Q and (ii) the edges that are in both Q and Q' . In the first case, 
the probability that the edge (a, b ) is present in %' is non-zero and is identically 
zero in "H. We then prove that if the probability distribution on the edges does 
not differ by “lot” , then with all but negligible probability, the respective sparse 
graphs will differ on at most one edge, i.e., only due to the (possible) presence 
of the edge (a, b) in H'. The privacy guarantee follows using the proof of BBDS. 

Using Local Sparsification Techniques: Construction of Spielman and Teng \ 51 /. 
Spielman and Teng [32[ proved the following result for any graph. 
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Theorem 4. [s2| There exists an 0(m) time algorithm which on input 0 < 
e,p < 1/2 and a graph Q with n-vertices and m-edges, outputs a sparse graph R 
with 0(n/e 2 ) edges such that R is an e- approximation of Q. 

The main construction of Spielman and Teng a for arbitrary graphs is lit- 
tle complicated and uses techniques of graph decomposition and contraction. 
In this section, we will use the construction that works for a graph with high 
conductance. In this construction, every edge, e = (i,j), is picked with prob- 
ability, pij := d. } ’ wh ere k = max{log 2 (3/p), log 2 n}, di denotes 

the degree of the vertex i, and p is an arbitrary constant between (0, 1/2). It is 
an easy exercise to check that the probability distribution on the edges that are 
already present does not change by a lot when a new edge is added due to the 
dependence only on the local structure of the graph (the eigenvalue changes by 
at most two by Lemma EH and degree of only the two end vertices changes). 

Using the proof outline mentioned above, we have the following theorem for 
the sanitizer in Figure [5] when we use the sparsifier of Spielman and Teng (32j| . 
Theorem 5. The algorithm in Figured preserves (e, S) DP, provides an answer 
that is ((1 + r7)(l + e),0, v)- approximation, and runs in time 0(n 2+0 W) when 
using 1321 sparsifier. 

Usin g G lobal Sparsification Techniques: Construction of Spielman and Srivas- 
tava IB]- We first recall the spectral sparsifier of Spielman and Srivastava [3lJ ■ 
One alternative way to see their sparsification is that R is a random projection 
of the edge matrix of Q, where edges are picked according to their importance 
in the original graph. The sparsifier construct a graph R by picking every edge, 
e G S(Q). with probability p e = w e R e /(n — 1) to be included in R. where R e is 
the effective resistance across the edge e and w e is the weight of the edge e. The 
effective resistance on an edge can be computed as 

R e =b e l) g b T e => R. = BgL^Bg . (6) 

Using the above probability distribution, Spielman and Srivastava [3l[ proved 
the following for any arbitrary graph. 

Theorem 6. There exists an 6(m(logr)/e 2 ) algorithm which on input e > 
1 /x/n and an n vertex, m edges graph Q, with the ratio of maximum weight to 
minimum weight r, outputs a sparse graph R such that R is an e- approximation 

ofS. 

Consider the matrix, 77 = BL g B T , where B is the signed edge-vertex matrix. 
It is easy to see that 77 is a projection matrix and has a well defined spectrum: 
eigenvalue 1 with multiplicity (n — 1) and 0 otherwise. Also, it has a nice relation 
to the probability with which an edge is picked to be placed in R: 77 eje = 
Wl/e R e WlJe = w e R e for every edge e = ( a,b ). Moreover, since the trace of 77 
is n— 1; therefore, p e = II e , e /(n— 1) = \\BL^(xa ~ X&)ll/( n— 1)) where \a is the 
characteristic vector of the vertex a. 

We have the following theorem for DP when using the sparsification technique 
of Spielman and Srivastava [3l| . 
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Theorem 7. If the input graph has high conductance, then the algorithm in 
Figure 0| runs in time 0{n 2+0 W) and preserves (e,<5) differential privacy with 
an utility of ((1 + rj){\ + e), 0, v) approximation when using fsll sparsifier. 

Few remarks are in order regarding Theorems [5] and [71 The theorems state that 
the sanitizer allows zero additive error. Therefore, if we have a graph with high 
conductance, we can get an answer with only multiplicative approximation for 
as low tolerance as possible. This stands in stark contrast with Theorem [3] and 
Theorem [3] where there is an additive approximation that governs the tolerance 
achieved by the sanitizer. The reason is that, as the underlying graph has high 
conductance, the smallest non-zero eigenvalue is large. This allows us to remove 
the step where we overlay an expander graph! 

4.2 Sanitizer for Arbitrary Graph 

Before we move to arbitrary graphs, we give an alternative for the local spar- 
sification technique of Spielman and Teng [3l[. They (and Trevisan [t|, inde- 
pendently) proved an important combinatorial property of an arbitrary graph, 
which can be used, in composition to their basic technique for high conductance 
graph, to prove the sparsification result for any arbitrary graph. 

Theorem 8. [H, HH Let Q = (V, E) be an arbitrary graph. Then there exists a 
set f'cf of k\£\ edges, such that removal of these edges decomposes the graph 
in some components, each of which have an smallest non-zero eigenvalue at least 
k 2 /72 ■ (log|£|) 2 . Furthermore, these edges can be found in polynomial time. 
The above theorem could be used to get sparsification algorithm for an arbi- 
trary graph. Let £' be the set of edges found by the algorithm guaranteed in 
Theorem [8l First apply the sparsification algorithm of [32j on all the compo- 
nents with high conductance, neglecting the edges in the set £' . We recursively 
apply the sparsification algorithm on £' until we get a sparse graph, i.e, one 
with |£'| < 0(n). The recursion depth is at most O(logn) rounds, so the overall 
run time of the algorithm is still under the bound guaranteed by Theorem 01 
Therefore, one could use the complete sparsification technique that decomposes 
the graph in to graphs of high conductance with few bridge edges between the 
components before the third step of Figure 0J The DP of the sanitizer would 
follow the same idea as in the proof of Theorem [5] because the probability of 
picking edges depends on local graph structure, and the utility guarantee would 
follow from the partition-then-sample lemma of Spielman and Teng (32| and the 
guarantee of JL transform. 

The idea of constructing sparse graphs using recursion also applies to the 
sparsifier of Spielman and Srivastava |31|, but does not help us in proving the 
DP because of a subtle reason: the probability distribution in [32J depend locally 
on the graph structure, i.e., only on the degree of the end-points of the edges 
(it also depends on the smallest non-zero eigenvalue of the Laplacian, but from 
Lemma [3] and Lemma 01 it is easy to prove that the eigenvalue change by at 
most 2). Thus, the probability distribution changes by any significant amount 
only for the edge that is added. 
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On the other hand, the probability distribution on an edge can change drasti- 
cally for Spielman and Srivastava’s sparsifier [dl[ . This is because the probability 
distribution depends globally on the whole graph and if an edge is added, it is 
possible that the effective resistance along a different edge changes by a lot. For 
example, if a unit weight edge (a, b) is added, then any edge ( u , v) that is parallel 
to (a, b) sees an effective drop to less than 1. If the conductance of the graph is 
not large, then this drop is significant. 

The key idea to work around this problem is to maintain the conductance of 
the graph. We do this by using appropriate order of composition: we first overlay 
a expander (or complete) graph on top of the input graph and then apply the 
Spielman-Srivastava’s sparsifier (3lj | . 

The utility guarantee follows by incorporating the approximation guarantee 
provided by Spielman and Teng 'a or Spielman and Srivastava [5l[ in the 
analysis of Theorem [3] 

Theorem 9. The algorithm in Figure^ preserves (e, £)-DP, provides an utility 
of(i 7> T ) Z/ ) approximation, where t < 0((rj+e)ws), and runs in time O(n 2+0 ®). 

Input. A n-vertex graph Q, and parameters e,5,r]. 

Output A Laplacian of a graph L. 

1. Pick a d-regular expander (or n- vertices complete) graph E. 

2. Set Lg 4 — ^ Le T (l — - ^jr) T g. 

3. Convert Q to a % using [3lJ or [si ], such that F is an e-sparsification of Q. 

4. Pick an n x n matrix M with each entries picked from A/”(0, 1). 

5. Output L = (M t L- nM)/n 

The sanitizer publishes the matrix L. For an input S C V, one can compute 

the number of vertices that crosses the cut as below 



Fig. 3. Sanitizer for Arbitrary Graph 


Remark. Note that we can perform the complexity analysis of all the sanitizer 
mentioned in Sections [3] and 0] using the optimization mentioned in Section [SI 

4.3 Answering ( S , T)-cut Queries 

One of the open problems listed by BBDS was to construct a sanitizer that 
answers (S, T)-cut queries on arbitrary graphs. Their concern for using the JL 
transform based mechanism is related to the inner product problem in JL trans- 
form. We get around this problem by making a simple combinatorial observation. 

Let S, T C V be the set of vertices and we wish to find <I>g ( S , T) = J2 s eS ter Wst ■ 
Note that 


*g(S)= E and MT)= E 



Wst- 
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Therefore, ^ g(S ) + <&g{T) counts the weight of the edges that are crossing 
the boundaries of either S or T. These edges includes two types of edges: one 
that are crossing the boundaries of either S or T but do not have end vertices 
in both the sets and the one that have one end in S and the other in T. Note 
that we are interested in counting the weight of the edges of the latter form. 
Therefore, $g{S) + i>g{T) — ( Pg(S U T) is the sum of the weights of the edges 
between S and T, counted twice. This observation gives us that d>g(S,T) = 
( $g(S ) + $g{T) — ( Pg(S U T))/ 2. Therefore, anyone with a set of vertices S and 
T as input and the sanitized graph from any of the mechanisms in this paper 
can compute $g (S, T) as below 

$g(S,T) = ^ ((xPxs + XtLxt - Xsut-^Xsut)) 

1 " fws(n-s) , wt{n-t) w ( s + t)(n - (s + t))\ 

2(l-f)V « « « J' 

Since computing the (S, T)- cut is three sequential applications of our basic san- 
itizer, DP follows from Theorem [3] (Theorem [S] and [TJ respectively) for sparse 
graphs (high conductance graphs and arbitrary graphs, respectively) and the 
composition theorem of Dwork, Rothblum, and Vadhan [10|, Theorem III.l] . 
The utility guarantee and efficiency guarantee are straightforward, giving the 
following theorem. 

Theorem 10. We can preserves (e, 6) DP and provides an utility of (r). r, u) ap- 
proximation, where r < 0((7?-|-e)u;max{s,t}) in time 0(n 2+0 W) for answering 
( S,T)-cut queries. 

4.4 Comparison with Other Algorithms 

In Table HJ we compare our sanitizer algorithms with other sanitizers that are 
proposed in the literature. It is not clear how to compare interactive and non- 
interactive sanitizers; therefore, for the additive errors, we have a column when 
total number of cut queries are at most k. 


Table 1. Comparison Between our Sanitizers and Other Sanitizers 


Method 

r for any k 

Curator’s Run Time 

Randomized Response [171 

OWsnlogk/e) 

0(n 2 ) 

Exponential Sanitizer [4, - 27] 

0(n log n/e) 

Intractable 

Multiplicative Weight [17, 191 

0{y/\£\\ogk/e) 

0(n 2 ) 

JL [3] 

CWiogfc/e) 

0(m--“) 

Basic Scheme 

0(s(ri + e')-\/log fc/e) 

O{n 2+ow ) 

Using Sparsifier 

0(s(r] + e')%/log k/ e) 

O(n 2+ow ) 
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It is easy to see from Table [T] that our sanitizer almost matches the best 
of both the worlds; it is almost as efficient as Randomized Response and the 
utility guarantee is as high as JL transform for constant e', e. For JL mechanism, 
we assume that for publishing the matrix, the sanitizer uses the Coppersmith- 
Winograd’s matrix multiplication algorithm. 

5 Optimization Using Fast-JL Transform 

Our last major contribution is to explore whether some other variants of JL 
transform also preserve DP. We show a positive result for fast JL-transform of 
Ailon and Chazelle [§]; thereby, partially answering an open problem of BBDS. 

Due to lack of space, we just give an overview of our proof. Recall that the fast- 
JL transform is the product PWD. The intuitive reason why fast JL transform 
preserves privacy is that fast-JL transform preconditions the input by perform- 
ing a random projection by matrix D. This is a random projection by the result 
of Achlioptas [lj] . It then applies an unitary matrix that is a FFT and then another 
random projection matrix P. Thus, it can be seen as the application of two ran- 
dom projections. Using Theorem III.l of Dwork, Rothblum, and Vadhan 0 and 
the main observation of BBDS, it preserves DP. This is our intuition behind the 
proof. 

The exact proof uses case analysis. Consider the edge (a, b) that is present 
in Q' and absent in Q. Let d\, ■ ■ ■ , d n be the diagonal entries of the matrix D. 
The proof proceeds by consider two cases: when d a = db and when d a ^ db . The 
first case is almost the same as in BBDS because WD is an unitary matrix. The 
upper bound when d a ^ db is also immediate. However, for lower bound, we 
need to analyze the terms in the decomposition of matrix WDL a bD T W T , and 
the eigenvalues of the projection of WDLg on the co-ordinates a, b. 

6 Open Problems 

Our technique of using spectral sparsification is very general. We believe it could 
be used as a subroutine in many sanitization algorithms, which are designed to an- 
swer queries based on the spectral properties, to improve their run time. It would 
be interesting to investigate other such spectral properties. For example, one possi- 
ble candidate for this improvement could be the differentially private low rank ap- 
proximation algorithm of Kapralov and Talwar j23| . This is because Kapralov and 
Talwar 0 assume that their private matrices are covariance matrices and publish 
the low rank approximation by computing the singular vectors. Since covariance 
matrices are symmetric, one could compute the spectral sparsification. 

Another aspect that is still open is to investigate whether other off-the-shelf 
JL transforms also preserve privacy or not. We have partially answered this 
question by studying fast JL transform, but there are many other variants that 
have applicability in different domains of computer science. In particular, we 
believe that any positive result for sparse JL transforms will be a significant step 
in improving the efficiency of our sanitizers and help in better understanding 
the relation between JL transforms and DP. 
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Abstract. Reductions are the common technique to prove security of 
cryptographic constructions based on a primitive. They take an allegedly 
successful adversary against the construction and turn it into a success- 
ful adversary against the underlying primitive. To a large extent, these 
reductions are black-box in the sense that they consider the primitive 
and/or the adversary against the construction only via the input-output 
behavior, but do not depend on internals like the code of the primitive or 
of the adversary. Reingold, Trevisan, and Vadhan (TCC, 2004) provided 
a widely adopted framework, called the RTV framework from hereon, to 
classify and relate different notions of black-box reductions. 

Having precise notions for such reductions is very important when it 
comes to black-box separations, where one shows that black-box reduc- 
tions cannot exist. An impossibility result, which clearly specifies the 
type of reduction it rules out, enables us to identify the potential lever- 
ages to bypass the separation. We acknowledge this by extending the 
RTV framework in several respects using a more fine-grained approach. 
First, we capture a type of reduction — frequently ruled out by so-called 
meta-reductions — which escapes the RTV framework so far. Second, we 
consider notions that are “almost black-box”, i.e. , where the reduction 
receives additional information about the adversary, such as its success 
probability. Third, we distinguish explicitly between efficient and ineffi- 
cient primitives and adversaries, allowing us to determine how relativiz- 
ing reductions in the sense of Impagliazzo and Rudich (STOC, 1989) fit 
into the picture. 


1 Introduction 

A fundamental question in cryptography refers to the possibility of constructing 
one primitive from another one. For some important primitives like one-way 
functions, pseudorandom generators, pseudorandom functions, and signature 
schemes it has been shown that one can be built from the other one [24], 03 , EmJ . 
For other primitives, however, there are results separating primitives like key 
agreement or collision-resistant hash functions from one-way functions [2bl . Isfil] . 

Separations between cryptographic primitives usually refer to a special kind 
of reductions called black-box reductions. These reductions from a primitive V 
to another primitive Q treat the underlying primitive Q and/or the adversary 
as a black box. Reingold et al. [33| suggested a taxonomy for such reductions 
which can be divided roughly into three categories: 
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Fully Black-Box Reductions: A fully black-box reduction <S is an efficient 
algorithm that transforms any (even inefficient) adversary A, breaking any 
instance G? of primitive V, into an algorithm S AA breaking the instance / 
of Q. Here, the reduction treats both the adversary as well as the primitive 
as a black box, and G is the (black-box) construction out of /. 

Semi Black-Box Reductions: In a semi black-box reduction, for any instance 
G? of V, if an efficient adversary A? breaks G? , then there is an algorithm 
S? breaking the instance / of Q. Here, S? can be tailor-made for A and /. 
Weakly Black-Box Reductions: In a weakly black-box reduction, for any 
instance G? of V. if an efficient adversary A (now without access to /) 
breaks G? , then there is an algorithm S* breaking the instance / of Q. 
Reingold et al. 0 indicate that the notion of weakly black-box reductions is 
close to free reductions (with no restrictions), such that separation results for this 
type of reduction are presumably hard to find. They discuss further notions like 
“V3 versions” of the above definitions, where the construction G does not make 
black-box use of / but may depend arbitrarily on /, and relativizing reductions 
where security of the primitives should hold relative to any oracle. We discuss 
these notions later in more detail. 

1.1 Black-Box Separation Techniques 

Known black-box separations usually obey the following two-oracle approach: 
to separate V from Q one oracle essentially makes any instance of V insecure, 
whereas the other oracle implements an instance of Q. It follows that one cannot 
build (in a black-box way) V out of Q. For example, Impagliazzo and Rudich 0] 
separate key agreement from one-way permutations by using a PSPACE-complete 
oracle to break any key agreement, and a random permutation oracle to realize 
the one-way permutation. This type of separation rules out so-called relativizing 
reductions, and are in this case equivalent to semi black-box reductions via 
embedding of the PSPACE-complete oracle into the black-box primitive 0 . 

Later, Hsiao and Reyzin 0 consider simplified separations for fully black-box 
reductions. Roughly speaking, they move the breaking oracle into the adversary 
such that the reduction can only access this oracle through the adversary (instead 
of directly, as in [2hl]). Because this makes separations often much more elegant 
this technique has been applied successfully for many other primitives, e.g., [111 . 
00010000. 

Interestingly, recently there has been another type of separations based on so- 
called meta-reduction techniques, originally introduced bvBoneh and Venkate- 
nesan H, and subsequently used in many other places 10000,0 
0,0,10. Such meta-reductions take an alleged reduction from V to Q and 
show how to use such a reduction to break the primitive “P directly, simulat- 
ing the adversary for the reduction usually via rewinding techniques. It turns 
out that meta-reductions are somewhat dual to the above notions for black-box 
reductions. They usually work against reductions which use the adversary only 
in a black-box way, whereas the reduction often receives the description of the 
primitive /. This notion then escapes the treatment in 0. 
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An interesting side effect when the reduction is given the description of / is 
that then the separation technique still applies to concrete problems like RSA 
or discrete logarithms, and to constructions which use zero-knowledge proofs 
relative to /. Such zero-knowledge proofs often rely on Karp reductions of / 
to an NP-complete language and therefore on the description of /. In contrast, 
for black-box use of the primitive / such constructions do not work in general, 
although some of them can still be rescued by augmenting the setup through a 
zero-knowledge oracle which allows to prove statements relative to / (see p})- 
We also remark that in some cases, such as Barak’s ingenious result about non- 
black-box zero-knowledge and related results BEL the security relies on the 
code of the adversary instead, though. 

1.2 Our Results 

The purpose of this paper is to complement the notions of fully, semi, and weakly 
black-box reductions. We also introduce a more fine-grained view on the involved 
algorithms, such as the distinction between efficient and non-efficient adversaries, 
or the question in how far the framework can deal with the reduction having 
partial knowledge about the adversary. We also formalize meta-reductions in the 
new framework and thus enable classification of this type of separation results. 
We give a comprehensive picture of the relationship of all reduction types. Next 
we discuss these results in more detail. 

As explained above, we extend the classification of black-box reductions to 
other types, like meta-reductions relying on black-box access to the adversary but 
allowing to depend on the primitive’s representation. This, interestingly, also af- 
fects the question of efficiency of the involved algorithms. That is, we believe that 
reductions for inefficient and efficient adversaries and primitives should in gen- 
eral not be resumed under a single paradigm, if efficiently computable primitives 
like one-way functions are concerned. For this class, classical separations tech- 
niques such as the embedding of the adversarially exploited PSPACE-complete 
oracle into the primitive do not work anymore. Hence, in this case one would 
need to additionally rely on a complexity assumption, such as for example in the 
work by Pass et al. [32|. To testify the importance of the distinction between 
efficient and inefficient adversaries in black-box reductions we show for example 
that black-box use of efficient adversaries is equivalent to non-black-box use, for 
constructions and reductions which are non-black-box for the primitive. Another 
example where the non-black-box use of the primitive turned out to be crucial 
is in the work by Mahmoody and Pass 0 where non-interactive commitments 
are built from non-black-box one-way functions, whereas constructions out of 
black-box one-way functions provably fail. 

Another issue we address is the question in how far information about the 
adversary available to the reduction may be considered as covered by black- 
box notions. Technically speaking, the running time of an efficient fully black- 
box reduction must not depend on the adversary’s running time, and thus for 
example on the number of queries the adversary makes to the primitive. Else, one 
would need to use a non-standard cost model for the reduction’s oracle queries 
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to the adversary. We overcome this dilemma by allowing the reduction’s running 
time (or other parameters) to depend on adversarial parameters, such as the 
number of queries the adversary makes when attacking primitive V. We call this 
a parameter-dependent reduction. 

We can go even one step further and give the reduction the adversarial param- 
eters as input. This is for example necessary to allow the reduction to depend 
on the adversary’s success probability, but otherwise treating the adversary as a 
black box. A well-known example of such an “almost” fully black box reduction 
is the security proof of the Goldreich-Levin hardcore predicate [Tol ] . attributed 
to Rackoff in [16| . This reduction depends on the adversary’s success probability 
for a majority decision, but does not rely on any specifics of the adversary nor 
the function to be inverted itself. We call such reductions parameter-aware. 

We note that it is up to the designer of the reduction or separation to precisely 
specify the parameters. Such parametrized black-box reductions potentially allow 
authors to counteract the idea behind black-box reductions by placing the adver- 
sary’s code in the parameters and thus making the reduction depend on the adver- 
sary again (via a universal Turing machine). But we assume that such trivial cases 
can be easily detected if the dependency is signalized clearly , just as in the case of a 
trivial reduction of a cryptographic protocol to its own security. So far, however, lit- 
erature seems to be often less explicit on which parameters the reduction is based 
upon, and if the reduction should really count as black box. Stating reductions 
clearly as parametrized black-box reduction should make this more prominent. 

In summary, we thus provide a more comprehensive and fine-grained view on 
black-box constructions and separations, allowing to identify and relate separa- 
tions more clearly. In our view, two important results are that we can place rel- 
ativizing reductions between non-black box constructions for inefficient and for 
efficient adversaries, and that for efficient adversaries the question of the reduc- 
tion having black-box access to the adversary, or allowing full dependency on the 
adversary, is irrelevant. This holds as long as the construction and reduction itself 
make non-black-box use of the primitive. From a technical point of view, one of 
the interesting results is clearly that any reduction from the indistinguishability 
of hardcore bits to one-wayness, such as in the Goldreich-Levin case jl9| , must de- 
pend on the adversary’s success probability (and thus needs to be parametrized). 

Nevertheless, we view the contributions in this paper to be primarily on the 
conceptual side. Given the central role that reductions play in modern cryptog- 
raphy, our impression is that a fundamental — but rather coarse — work like [331] 
leaves some potential for refinement. Let us demonstrate this by the following 
two examples. 

The Hsiao-Reyzin separation 0 is often termed fully black-box (according 
to [3l|) and considered to be a rather “weak” separation. Our more fine-grained 
picture shows that the separation is actually of the NNN type and thus rather 
a low-level (i.e., strong) separation which cannot be bypassed through, say, any 
non-black-box technique in either direction of the CAP dimensions. Hence, non- 
black-box techniques cannot be used to sidestep this impossibility result; looking 
at efficient adversaries/primitives may help, though. 
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Similarly, according to [33j], meta-reductions only rule out BBB reductions. 
So, the framework does not make any distinction between the strength of meta- 
reductions and some oracle separations. However, most meta-reductions today 
rely on unbounded adversaries. As our paper exhibits one might circumvent such 
meta-reductions by switching to the “parallel universe” of efficient adversaries, 
identifying exactly what kind of black-boxness is still admissible according to 
our implications (e.g., if the meta-reduction rules out NBN reductions, then one 
may still manage to find an NBNa reduction). 

Thus, our framework reveals that some impossibility results actually rule out 
a great class of reductions and points exactly to the remaining few leverages to 
give positive results. 

2 Notions of Reducibility 

We extend the ori gina l framework for notions of reducibility by Reingold, Tre- 
visan and Vadhan [33[. Since we augment the basic notions in various directions, 
we find it useful to use a different terminology for the reduction types. Instead 
of referring the original terms fully, semi, weakly, and their V3 variants, we use a 
more descriptive three-character “CAP” notation with words from the language 
{B, N} 3 , with the meaning that a ‘B’ in the first position (the C-position) refers 
to the fact that the Construction is black-box, in the second A-position that the 
Adversary is treated as a black-box by the reduction, and in the third P-position 
the Primitive is treated as black-box by the reduction. Accordingly, an entry ‘N’ 
stands for a non-black-box use. From each combination of constraints, we then 
derive the order of quantification to obtain the actual definitions. 

Hence, a fully black-box reduction in the RTV framework corresponds to a 
BBB-reduction in our notation, and a V3 fully black-box reduction is an NBB- 
reduction in our sense. The CAP notation will later turn out to be handy when 
showing implications from an XFZ-reduction to an WZ-reduction. whenever 
XYZ is pointwise at most as large as XYZ (with N being smaller than B). It 
also allows to see immediately that the RTV framework only covers a fraction 
of all 8 possibilities for the CAP choices (although the NNB type is actually not 
meaningful, as we discuss later), and that we fill in the missing types BBN, as 
often ruled out by meta-reductions, and the dual BNB type where the primitive 
but not the adversary is treated as a black-box. 

Extending the RTV framework in another dimension, we differentiate further 
based on the (in) efficiency of the primitives and adversaries. We append the 
suffix ‘a’ to denote an efficiency requirement on the adversary, i.e., a BBBa- 
reduction only works for all probabilistic polynomial-time (PPT) adversaries 
A, while a BBB-reduction is a fully black-box reduction that transforms any 
adversary A into an adversary against another primitive. Likewise, we use ‘p’ 
to indicate that we restrict primitives to those which are efficiently computable; 
the suffix ‘ap’ naturally combines both restrictions. 
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2.1 Overview 

At the top of the RTV hierarchy there are fully black-box reductions — or, BBB- 
reductions in our CAP terminology. These BBB-reductions from a primitive V 
to a primitive Q is a pair (G,<S) consisting of a construction G and a reduc- 
tion algorithm S. Both treat the primitive in a black-box way and the reduction 
treats the adversary in a black-box way. So, for all adversaries A and all instan- 
tiations / of the primitive Q, we have that, if the adversary A/ breaks G? , then 
the reduction S A ^ with black-box access to the adversary A and / breaks the 
implementation /. As a consequence, the existence of primitive Q implies the 
existence of the primitive V. 







(a) 



Fig. 1. | (a) | shows the relation of notions in the RTV framework. The dashed arrows 
indicate equivalence for a restricted class of reductions. In our framework |(b)| it is 
instructive to look at the vertical planes for fully, *BN, semi, and weakly. The left 
side corresponds to inefficient adversaries, the right side to efficient ones. The front is 
the V3 layer, i.e., non-black-box constructions, and the back corresponds to black-box 
constructions. As NNB-reductions are not meaningful, we only need the BNB type 
(in gray). The w*NN notions are equivalent to the weakly notions of RTV. A notion 
A implies notion B if there is a path of edges between both notions and notion A is 
located above notion B. 


The RTV framework discusses several variants and relaxations of fully black- 
box reductions, called semi, weakly, and relativizing reductions. For semi black- 
box reductions (aka. BNN-reductions) <S can depend on both, the description 
of the adversary A and of the instantiation /, and only the construction is 
black-box. For weakly black-box reductions (which are also of the BNN type) 
the adversary is additionally restricted to be efficient and does not get access 
oracle to the primitive (but may depend on it). There is a relativizing reduction 


302 P. Baecher, C. Brzuska, and M. Fischlin 


between the primitives V and Q, if for all oracles, the primitive V exists relative 
to an oracle whenever Q exists relative to this oracle. Figure [Ta| illustrates the 
relationships between these classes. 

We augment the RTV framework by new classes which represent, among oth- 
ers, reductions that are ruled out by certain meta-reductions. That is, we first 
introduce the notion of BBN-reductions where S has to work for all (black-box) 
adversaries, but may depend on the code of /. The other case, where S is univer- 
sal for all black-box / but may depend on A, is called BNB-reduction. In both 
cases the initial ‘B’ indicates that the construction still makes black-box calls to 
the primitive. We remark that semi black-box and weakly black-box reductions 
are of the same BNN type in our notation as they only differ in regard to the 
adversary’s access to /. As pointed out in 0 weakly black-box reductions are 
close to free reductions, and black-box separations are presumably only possible 
at the semi level or above. In a sense, our CAP model only captures these levels 
above, and other types like free or relativizing (or weakly) reductions are special. 
For the sake of completeness, we symbolically denote (but do not define) weakly 
reductions w*NN and remark that they essentially correspond to the weakly 
type of RTV. Note that weakly black-box reductions are called mildly black-box 
in some versions of RTV. 

The RTV framework also considers the type of construction (black-box vs. non- 
black-box) and uses the prefix V3 to indicate that construction G does not need 
to be universal for all / but can, instead, depend on the description of /. In our 
CAP terminology this “flips” the initial ‘B’ to an ! N’. By this, we get 8 combina- 
tions, of which 7 are reasonable. The notion of NNB-reduction is not meaningful, 
because we are restricted by the following dependencies: the construction may 
depend on the primitive, the reduction may depend on the adversary, and the 
reduction should be universal for the primitive. Thus, there is only one way to 
order the quantifiers (VA3AV/3G) which does not seem to be a reasonable no- 
tion of security, because the construction can now depend on the adversary (and 
if it does not, we are in the other cases). 

We note that the notion of an NBB-reduction is debatable, because it relies on 
a universal reduction which works for arbitrary constructions. That is, the order 
of quantifiers is 3SNf3GNA. But since there may indeed be such reductions, 
say, a trivial reduction from a primitive to itself, we do not exclude this type of 
reduction here. 

2.2 Definitions of Reductions 

We next provide definitions of BBB (aka. fully black-box) reductions, BNB and 
BBN reductions; the remaining definitions are delegated to the full version of 
this paper [l[. 

A primitive Q = (Jg, 77g) is represented as a set Tq of random variables, 
corresponding to the set of implementations, and a relation TZq that describes 
the security of the primitive as tuples of random variables, i.e., a random vari- 
able A is said to break an instantiation / £ Tq, if and only if (/, A) £ TZq. 
Following a we say that a primitive exists if there is a polynomial-time 
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CAP 

[33] name 

Remark(s) 

BBB 

BBN 

fully 

known meta reductions: [8, 22j 

BNB 


known reduction: [19] 

BNN 

semi (weakly) 

NBB 

V3-fully 

formally not defined in [33], only “trivial” reductions 

NBN 


known meta reductions: [6, i 22, 14, jll] 

NNB 


not meaningful 

NNN 

V3-semi (V3-weakly) 



Fig. 2. CAP indicates whether the construction (C), the adversary in the reduction 
(A), or the primitive in the reduction (P) is treated in a black-box (B) or non-black-box 
(N) way 


computable instantiation / e Ta such that no polynomial-time random variable 
breaks the primitive. Indeed, [33| demand that primitive sets Tq are non-empty, 
but do not motivate this further. We drop this requirement here as reductions 
explicitly depend on primitives, such that one can enforce such non-empty sets 
by investigating only such primitives if necessary. Still, we remark that all our 
implications and separations would work in this case as well. 

For efficient primitives or adversaries we stipulate that the random variable 
is efficiently computable in the underlying machine model which, unless men- 
tioned differently, is assumed to be Turing machines; the results remain valid for 
other computational models like circuit families. Considering security as a gen- 
eral relation allows to cover various (if not all) notions of security: games such 
as CMA-UNF for unforgeability of signature schemes, simulation-based notions 
such as implementing a UC commitment functionality, and even less common 
notions such as distributional one-way functions. In the full version of this pa- 
per [if] we define as examples the DDH assumption (cast as a primitive) and 
the indistinguishability of the ElGamal encryption scheme . We also present the 
reduction from the ElGamal encryption to the DDH assumption and identify its 
type according to our terminology. Note that a “black-boxness” consideration in 
this particular setting is indeed meaningful, because the DDH assumption can 
hold in a variety of group distributions and the concrete procedures that sam- 
ple from these group distributions can be abstracted away. In the full version 
we discuss another example of weak one-way functions (and the construction 
of strong one-way functions 0 ) to highlight that the type of reduction hinges 
on the exact formulation of the underlying primitive: the construction and the 
reduction is then either of the NBN type or of the BBB kind. 

We stress that the distinction between the mathematical object describing the 
adversary as a random variable, and its implementation through, say, a Turing 
machine is important here; else one can find counter examples to implications 
among black-box reduction types proven in [33| . The problem is roughly that the 
relation may simply be secure because it syntactically excludes all oracle Turing 
machines A* . We note that Reingold et al. 0 indeed define the relations for 
adversarial machines. Our discussion in [if shows that only interpreting such 
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adversaries as abstract objects sustains the implications in [33[ . However, for 
sake of convenience, we too often refer to A f by the machine implementing it, 
even when considering the mathematical random process for relations TZq. In this 
case it is understood that we actually mean the abstract random variable instead. 
The same holds for the constructions of the form G? and the first component of 
the security relations. An alternative approach, also presented in the full version 
is to rely on machines, but to formally introduce semantical relations. These 
relations roughly require that, for any algorithm A in TZq, any oracle machine 
A f with the same output behavior is also in IZq. 

We now turn to the actual definitions. Many (but not all) reductions in cryp- 
tography fall into the class of so-called fully black-box reductions, a very re- 
strictive notion, where the reduction algorithm is only provided with black-box 
access to the primitive and the adversary. Throughout the paper, if there is a 
AYZ-reduction from primitive V to a primitive Q, we notate this as (P Q)- 
AYZ-reduction. Note that the correctness is requirement is the same for all 
definitions. Therefore, the shorthand notation towards the end of each definition 
covers the security requirement only. 

Definition 1 {{V ^ Q)-BBB or Fully Black-Box Reduction). There ex- 
ists a fully black-box (or BBB-)reduction from a primitive V = (Tp,TZp) to a 
primitive Q = (Tq,TZq) if there exist probabilistic polynomial-time oracle algo- 
rithms G and S such that: 

Correctness. For every f G Tq, it holds that G f G Tp. 

Security. For every implementation f G Tq and every machine A, if(G^,A^) G 
TZp, then {f,S A ' f ) G IZq, i.e., 

3PPTG 3PPT.S V/ G Tq VM (( G f ,A f ) G TZ-p =>■ (f,S AJ ) G TZq). 

Definition 2 ((V ^ Q)-BNB-reduction). There exists a BNB-reduction 
from a primitive V = (Tp,TZp) to a primitive Q = ( Tq,1Zq ) if there exists 
a probabilistic polynomial-time oracle machine G such that: 

Correctness. For every f G Tq, it holds that G * G Tp. 

Security. For every machine A, there is a probabilistic polynomial-time oracle 
algorithm S such that: for every implementation f G Tq, if (G^ , A/) G TZp, 
then ( f,S A ’f ) G TZq, i.e., 

3PPTG VM 3PPTS V/ G Tq (( G f ,A f ) G 1Z V => ( f,S AJ ) G TZq). 

Definition 3 (( V c -t Q)-BBN-reduction). There exists a BBN-reduction 
from a primitive V = [Tp,TZp) to a primitive Q = (Tq,7Zq) if there exists 
a probabilistic polynomial-time oracle machine G such that: 

Correctness. For every f G Tq, it holds that G * G Tp. 

Security. For every implementation f G Tq, there is a probabilistic polynomial- 
time oracle algorithm S such that for every machine A, if (G ^ , A) G TZp, 
then (f,S A, f) G TZq, i.e., 

3PPTG V/ G Tq 3PPTG VA (( G f ,A f ) G TZ V => ( f,S AJ ) G TZq). 
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Name Summary of definition 


BBB 

3PPTG 3PPTS 

V/ £ Fq VA 

((G f ,A f )e n v => (j,s AJ ) £ n a ) 

BNB 

3PPTG VA 

3PPTS 

V/ £ .Fa 

(( G f ,A f )en v =► (j,s AJ ) e n a ) 

BBN 

3PPTG V/ £ Fq 3PPTS 

VA 

(( G f ,A f )e1Z v =► (J,S AJ ) £ n a ) 

BNN 

3PPTG V/£-F Q VA 

3PPTS 

((G f ,A f )e Hv =k U,S A ’ S ) £ 7 z a ) 

NBB 

3PPTS V/ £ Fa 3PPTG 

VA 

({G f ,A f )en v (f,S AJ ) eKq) 

NBN 

V/ £ Fq 3PPTG 

3PPTS 

VA 

((G f ,A f ) £ Tl-P (f,S A}f ) £ Fq) 

NNN 

V/ £ Fq 3PPTG 

VA 

3PPTS 

((G f ,A f ) en v => (f,S A ’ f ) en Q ) 

weakly-BB 

3PPTG VA 

V/ £ Fq 3PPTS 

((G f ,A)en v ^(f,S A ’ f )eTlQ) 

V3- weakly-BB V/ £ Fq 3PPTG 

VA 

3PPTS 

((G f ,A)en 7 ,^(f,S A ’ f )en Q ) 


Fig. 3. Overview of notions of reducibility 


Note that we always grant S black-box access to / and A, as they may not 
be efficiently computable so that the probabilistic polynomial-time reduction 
algorithm S cannot efficiently simulate them, even if it knows the code of /, 
respectively, of A. For a compact summary of all definitions, see Figure 02 the 
full definitions omitted above appear in the full version of this paper [l[ . 

2.3 Efficient versus Inefficient Algorithms 

Reductions usually run the original adversary as a subroutine. However, in many 
cases, the reduction does not use the code of the original adversary, but instead 
only transforms the adversary’s inputs and outputs. Thus, one might consider 
the reduction algorithm as having black-box access to the adversary only. An 
efficient reduction can then also be given black-box access to an inefficient ad- 
versary, and, maybe surprisingly, most reductions even work for inefficient ad- 
versaries. Imagine, for example, the case that one extracts a forgery against a 
signature scheme from a successful intrusion attack against an authenticated 
channel. Then, the extraction usually still works for inefficient adversaries. On 
the other hand, (unconditional) impossibility results often require the reduction 
algorithm to be able to deal with inefficient adversaries. 

When designing a fine-grained framework for notions of reducibility, one thus 
needs to decide whether one considers efficient or inefficient adversaries. Rein- 
gold et al. 0 defined their most restrictive notion of reductions, the fully-BB- 
reductions (aka. BBB), for inefficient adversaries. In contrast, their notion of 
semi-BB-reduction treats only efficient adversaries thus making it easier to find 
such a reduction. Surprisingly, even for such a weak notion, they were able to 
give impossibility results. The reason is that they used inefficient primitives, 
which allow to embed arbitrary oracles so that they could make use of two- 
oracle separation techniques. Hence, the efficiency question does not only apply 
to adversaries, but also to the primitives (and, consequently, to the combination 
of both). We postpone the treatment of the case of primitives for now and refer 
the reader to Section 12.61 
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We now define the efficient adversary analogues of the notions of reduction 
introduced in the previous section. Note that we still give the reduction <S oracle 
access to the adversary A in all notions, even though the latter can be dropped 
for all cases where <S depends on A in a non-black-box way. In these cases, a 
probabilistic polynomial-time reduction S can simulate the now likewise efficient 
adversarial algorithm A. For consistency, though, we keep the A oracles in the 
definitions. To distinguish the two cases of efficient and unbounded adversaries, 
denote by BBBa-reduction a reduction only dealing with efficient adversaries. 

Definition 4 ((V ^ Q)-BBBa-reduction for Efficient Adversaries). 

There exists a BBBa-reduction from a primitive V = (Fp.Kp) to a primi- 
tive Q = (Fq,Kq) if there exist probabilistic polynomial-time oracle machines 
G and S such that: 

Correctness. For every f £ Tq, it holds that G? £ T-p. 

Security. For every implementation f £ Tq and every probabilistic polynomial- 
time machine A, if {G f ,A) £ 7 Z-p, then ( f,S A ' f ) £ Kq, i.e., 

3PPTG 3PPT S V/ £ Tq VPPTM ((G / , A f ) £ K-p => £ Kq). 

Again, the definitions for the remaining types of reductions are presented in 
the full version of this paper [l| . 


2.4 Relations amongst the Definitions 

We first note that a number of implications among the reductions is immediately 
clear by simply shifting quantifiers, that is, if we have an for-all quantifier, there is 
certainly an existential version of the reduction in question. The next proposition 
states this formally, we omit the proof because it is only syntactical. 

Theorem 1. Let XYZ and XYZ be two types of CAP reductions such that 
XYZ < XYZ point-wise (where N < B) and let V and Q be two primitives. 
If there is a (P Q)-XYZ-reduction, then there is a {V Q)-XYZ reduction. 
Also, if there is a (V Q)-XYZa-reduction, then there is a (V ^4 Q)-XYZa 
reduction. 

In the full version of this paper [l| , we prove via means of counterexamples that 
for all notions for inefficient adversaries, almost all the above implications are, 
indeed, strict. These separations are split into a number of interesting observa- 
tions. For example, we prove that the Goldreich-Levin hardcore bit reduction tl9j 
has to depend on the success probability of the adversary (Theorem D.3 of [l|). 
Moreover, we show that the construction of one-way functions out of weak one- 
way functions ([UHsl) needs to depend on the weakness parameter of the weak 
one-way function (Theorem D.2 of 0)- Interestingly, some of the implications 
of Theorem |T] are not strict when one is concerned with reductions for effi- 
cient adversaries. Maybe surprisingly, NNNa-r eductions and NBNa-reductions 
are, indeed, equivalent. Note that this means that knowledge of the code of the 
adversary does not lend additional power to the reduction: 
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Theorem 2 (Equivalence of NNNa and NBNa). For all primitives V and 
Q, there is a (V ^ Q)-NBNa-reduction if and only if there is a (V Q)- 
NNNa-reduction. 

Proof. Using straightforward logical deductions, it follows that NBNa-reductions 
imply NNNa-reductions. For the converse direction, assume that we have two 
primitives V and Q such that there is a (P 4 Q)-NNNa- reduction. We now 
have to show that there also is a (fP '->• Q)-NBNa-reduction, that is, we have to 
give a reduction algorithm S that depends on / in a non-black-box- way, and yet 
S depends on A only in a black-box way. We proceed by case distinction over /. 

Case I: Suppose / G Pq such that for all constructions G, the primitive G? 
is a secure implementation of V, i.e., for all polynomial-time adversaries A it 
holds that (G*, A f ) 7 Z-p. Then proving the existence of a reduction satisfying 

the implication (Gf , A* ) G 7 Z-p =k (f,S A, f) G 7 Zq is trivial, as the premise of 
the implication is never satisfied. 

Case II: For any / G Pq outside the class described in Case I, we know that 
there exists a PPT construction G such that for all A there is a reduction algo- 
rithm S that satisfies (G? ,A^) G 7 Z-p => ( f,S A, f ) G IZq, and such an efficient 
A with (GU A?) G 7 Up exists. For any such /, we now fix a unique adversary 
Af, say, by taking the random variable Af with the shortest description accord- 
ing to a particular encoding, such that it satisfies (G^,Aj) G 7 Up. For such an 
Af let S be a probabilistic polynomial-time reduction making black-box use of 
Af such that (/, S Af ^) G TZq. Consider the oracle algorithm sj that has the 
same behavior as S Af ’f, but it incorporates Af and only has an /-oracle. The 
algorithm sj only depends on /, satisfies (Sj, /) G IZq, and is implementable 
in probabilistic polynomial time, as S and Af are both polynomial time algo- 
rithms. Thus, regardless of construction G, we showed that for all / there is 
an efficient reduction S such that (S ? ,/) G IZq, namely by choosing S? = Sj. 
Thus, we also know that for all /, there is a reduction S such that for all A, if 
(A, G f )g 7 Zp then (£U /) G IZq. If now, we add an adversary oracle A that is 
ignorecO by S, we also obtain that (dU, /) G IZq. And thus, there is a (V ^ <2)- 
NBNa-reduction. □ 

We now show that, while a reduction for inefficient adversaries always implies a 
reduction for efficient adversaries of the same type, the converse is not true in 
general. 

Theorem 3. For eachXYZ G { BBB, BNB, BBN, NBB, BNN, NBN, NNN}, there 
are primitives V and Q such that there is a (V Q)-XYZa-reduction, but no 
(' P Q)-XYZ-reduction. 

Proof. For the primitive V we consider a trivial primitive, namely the constant 
all-zero function, denoted fo . Let £ be an EXPTIME-complete problem. The pair 
(fo,A) is in the relation 7 Z-p if and only if the adversary A is a deterministic 
function that decides £. Let Tq also consist of the set that only contains the all- 
zero function fo . The relation IZq is empty. Observe that, for efficient adversaries, 

1 Here, we require the relation to be machine-independent. 
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the primitive V is secure because EXPTIME strictly contains the complexity class 
P [23j. Thus, there is a trivial reduction since the premise of the implication 

(Gf,A f )en r ^(f,S A ' f )en Q 

is never satisfied for any efficient adversary A. Hence, for all XYZ ^ NNB, there 
is a (V ^ Q)-XYZa-re(luctAon. In contrast, inefficient adversaries can break the 
primitive V, while, as TZq is empty, no reduction S can break TZq. even oracle 
A. Thus, for all XYZ e {BBB, BNB, BBN, NBB, BNN, NBN, NNN}, there is no 
{V Q)-XYZ-reduction. □ 


2.5 Relativizing Reductions 

In complexity theory as in cryptography, most reductions relativize in the pres- 
ence of oracles, i.e., if a (secure instantiation of the) primitive V can be built 
from a (secure instantiation of the) primitive Q, then the construction still works, 
if additionally, all parties get access to a random oracle (or any other oracle). 
We say that there is a relativizing reduction from V to Q, if for all oracles 17, 
the primitive V exists relative to 77, whenever Q exists relative to 77. Often, 
separation results rule out such reductions. 

Definition 5 (Relativizing Reduction). There exists a relativizing reduction 
from a primitive V to a primitive Q, if for all oracles 77, the primitive V exists 
relative to 77 whenever Q exists relative to 77. A primitive V is said to exist 
relative to 77 if there is an f G T-p which has an efficient implementation when 
having access to the oracle 77 such that there is no probabilistic polynomial-time 
algorithm A with ( f,A n ’* ) G IZ-p. 

We remark that, since we define security relations over random variables and 
not their implementations, it is understood that the im plem entation of f may 
actually depend on 77, too. According to Reingold et al. [33j] , relativizing reduc- 
tions are a relatively restrictive notion of reducibility that they place between 
BBB-reductions and NNNa-reductions. Jumping ahead, we note this is due their 
treatment of (in-) efficient adversaries: they require BBB-reductions to also work 
for inefficient adversaries A , and so do we. In contrast, for NNNa-reductions, 
Reingold et al. allow the reduction algorithm to fail for inefficient adversaries A. 
As we can show, all notions of reducibility for inefficient adversaries, including 
NNN-reductions, imply relativizing reductions, i.e., we can place relativizing re- 
ductions between NNN- and NNNa-reductions showing that, in fact, the notion is 
very liberal compared to notions of reductions that treat inefficient adversaries. 
In contrast, for efficient adversaries, relativizing reductions imply NNNa- and 
(the equivalent) NBNa-reductions and are incomparable to all stronger notions 
that treat efficient adversaries. 

We now prove that relativizing reductions are implied by NNN-reductions 
for inefficient adversaries, i.e., according to Definition C.4 of [![. The proof is 
inspired by Reingold et al. (33| who show that BBB-reductions imply relativizing 
reductions. 
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Theorem 4. If there is a (P Q)-NNN-reduction, then there is a relativizing 
reduction from V to Q. 

Proof. Assume there is an NNN-reduction between two primitives V and Q 
and assume towards contradiction that there is an oracle II such that Q exists 
relative to this oracle, but V does not. Let / £ Tq be an instantiation of Q that 
is efficiently computable by an algorithm that has oracle access to II and such 
that / is secure against all efficient oracle machines S, i.e., for all probabilistic 
polynomial-time machines S, one has ( f,S n ) £ TZq. By assumption of a (P H 
Q)-NNN-reduction, there exists a PPT oracle algorithm G for /, such that for all 
(possibly unbounded) adversaries A there is a PPT reduction algorithm S such 
that {G f , A?) £ 7 Z-p implies (f,S^ A ) £ TZq. Now, G? is efficiently computable 
relative to the oracle II, because G is PPT and / is efficiently computable relative 
to 77 . Since V does not exist relative to 77 , there is an efficient adversary A such 
that ( Gf,A n ) £ 7 Z-p, i.e., by considering that the relations are defined over 
random variables, setting A' := A n one also has [G? , A! £ 7 Z-p. Thus, the 
NNN-reduction gives an efficient reduction S such that ( f,S A ’f) £ TZq. As S is 
PPT and as / and A! are efficiently computable relative to oracle 77 , one has that 
s A 'f is efficiently computable relative to 77 . Thus, / is not “Q-secure” against 
all efficient oracle machines with oracle access to 77 , yielding a contradiction. □ 

This proves that for inefficient adversaries, relativizing reductions are implied by 
NNN-reductions, the most liberal notion of reductions for inefficient adversaries. 
Conversely, for efficient adversaries, relativizing reductions imply NNNa and 
NBNa reductions, but they are not implied by any of the stronger notions. We 
adapt the proof due to Reingold et al. [33| for the following theorem. 

Theorem 5. If there is a relativizing reduction from V to Q, then there is a 
(' P ^4 Q)-NNNa-reduction, and a (P Q)-NBNa-reduction. 

Proof, ft suffices to show that relativizing reductions imply NNNa-reductions 
for efficient adversaries, as Theorem [2] proves that NBNa-reductions and NNNa- 
reductions are equivalent. Assume that there is a relativizing reduction between 
the primitives V and Q, and assume towards contradiction that there is an 
/ £ Tq such that for all constructions G, there is an efficient adversary A such 
that for all efficient reductions algorithms S, it holds that (G? , A/) £ 7 Z-p but, 
simultaneously, ( f,S A ^ ) ^ TZq. Then, by definition, relative to oracle /, the 
primitive Q exists, as no efficient algorithm with oracle access to / can break /. 
Note that we can view S f as an algorithm S ,A ^ which does not query A but 
has the same output distribution, if viewed as random variables. By assumption, 
there exists a relativizing reduction between V and Q. and thus, relative to the 
oracle /, not only Q exists but also the primitive V . In particular, there is a 
probabilistic polynomial-time oracle machine G such that G? implements V and 
such that for all efficient oracle machines A, one has (G^,A^) £ 7 Z-p, i.e., V is 
secure against all efficient adversaries that get / as an oracle, a contradiction. 
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Theorem 6. For XYZ 6 {BBB, NBB, BBN , BNB, BNN, NBN, NNN}, there are 
primitives V and Q such that there is a {V Q)-XYZa-reduction for efficient 
adversaries, but no relativizing reduction. 

Proof. We show that BBBa-reductions do not imply relativizing reductions; as 
BBBa-reductions imply the “lower level” reductions, the other cases follow. We 
use the same approach as for Theorem [31 

Let Q be the primitive that contains the constant 0-function f 0 . We define 
the relation 7 Z-p such that V is trivially secure against all efficient adversaries, 
namely, let £ be an EXPTIME-complete language, then (Jo, A) is in 7 Z-p if A 
is a deterministic function and decides £. As the complexity class P is strictly 
contained in EXPTIME, no efficient adversary can break V. Let Q also be the 
primitive that contains the constant 0-function /o, but with a different relation, 
namely 7 Zq is empty. In particular, no adversary can break Q. Hence, there is a 
trivial (' P ^ Q)-BBBa-reduction, because the premise of the implication 

(G f ,A f )en v ^(f,S A ’ f )en Q 

is never satisfied for efficient adversaries and the implication is thus trivially 
true. In contrast, there is no relativizing reduction between the two primitives. 
That is, assume, we add an oracle that decides the EXPTIME-complete language 
£. then relative to this oracle, there are suddenly efficient adversaries that break 
V. However, as 7 Zq is still empty, there cannot be a reduction S in this oracle 
world, giving us a contradiction. □ 

Reingold et al. [t| note that BNNa-reductions for efficient adversaries and rel- 
ativizing reductions are often equivalent. In particular, they prove that if a 
primitive Q allows any oracle 77 to be embedded into it, then a (V '->• Q)- 
BNNa-reduction implies a (V '-»• Q)-relativizing reduction. However, efficient 
primitives Q such as one-way functions (as opposed to random oracles, for ex- 
ample), are not known to satisfy this property. We discuss this issue in more 
detail in the following section about efficient primitives. 


2.6 Efficient Primitives versus Inefficient Primitives 

A reduction for efficient primitives is a reduction that only works if / £ Fq 
is efficiently implementable, i.e., in probabilistic polynomial-time. If we make 
this distinction then, according to Figure [2 we unfold another dimension (anal- 
ogously to the case of efficient adversaries). As we discuss below our results for 
non-efficient primitives hold in this “parallel universe” of efficient primitives as 
well, and between the two universes there are straightforward implications and 
separations (as in the case of efficient and inefficient adversaries) . 

Technically, one derives the efficient primitive version XYZ p of an XYZ- 
reduction by replacing all universal quantifiers over primitives / in Fq by uni- 
versal quantifiers that are restricted to efficiently implementable / in Fq. More 
concretely, we replace V/ £ Fq by the term VPPT f £ Fq. For example, the 
notion of a BBBp-reduction then reads as follows: 
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Definition 6 ((V Q)-BBBp or Fully Black-Box Reduction for Ef- 
ficient Primitives). There exists a fully black-box (or BBBp-)reduction for 
efficient primitives from V = (J r -p,TZ'p) to Q = {Tqj'R.q) if there exist proba- 
bilistic polynomial-time oracle algorithms G and S such that: 

Correctness. For every polynomial-time computable function f £ Tq, it holds 
that ( l' £ Tp . 

Security. For every polynomial-time computable function f £ Tq and every 
machine A, if (G? , A) £ 7 Z-p, then (f,S A, f) £ 1 Zq, i.e., 

3PPT G 3PPTS VPPT / e Tq VA (( G f ,A f ) £lZ v ^ ( f,S AJ ) e 7 Z Q ). 

In the same manner, for any XYZ-reduction, we can define the corresponding 
XYZ p-reduct ion . Similarly, one can transform all reduction types XYZ& for 
efficient adversaries into reduction types XYZ ap for efficient adversaries and ef- 
ficient primitives. Most relations that this paper establishes for XYZ-reductions 
and X YYa- reductions also hold for XYZ p- and XYZap-reductions, except for 
the relation to relativizing reductions, where only some of the results carry over, 
see Theorem 2.15 of Jd|. Building on proof ideas of Theorem [oj we also establish 
in Theorem 2.14 of [1[ that the implication from reductions for arbitrary prim- 
itives to reductions for efficient primitives is strict. We refer the reader to the 
full version of this paper [l[ for formal theorem statements, proofs and further 
discussion of the relations of reductions for efficient primitives. 


3 Parametrized Black-Box Reductions 

Many reductions in cryptography commonly classified as “black box” technically 
do not fall in this class, as a black box reduction algorithm must not have any 
information about the adversary beyond the input/output behavior, except for 
the sole guarantee that it breaks security with non-negligible probability. Strictly 
speaking, this excludes information such as running time, number of queries, or 
the actual success probability of a given adversary. This prompts the question of 
what the “natural” notion of a black-box reduction should be. Not surprisingly, 
the answer is a matter of taste, just like the question whether fully black-box or 
semi black-box is the “right” notion of a black-box reduction. As in the case of 
different notions of black-box reductions, we can nonetheless give a technically 
profound, and yet easy-to-use notion of parametrized black-box reductions (of 
any type) . In the full version [l[ we motivate and formalize two different degrees 
of parameterization by distinguishing between parameter- aware and parameter- 
dependent reductions. The difference is essentially whether or not the reduction 
algorithm receives the parameter values as input. 

We note that parametrized black-box reductions and separations rely criti- 
cally on the specific parameters. In particular, some of our separations consider 
reductions that are required to depend on, say, the success probability of the 
adversary, as in the case of the Goldreich-Levin hardcore bit. This separation 
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Fig. 4. The effect of parametrization (in the case of *BN-reductions). Parametrized 
counterparts of each type partly descend towards the corresponding *NN-reduction 
with full dependency on the adversary. 


does not carry over to the parametrized case. In contrast, separations for ef- 
ficient /inefficient adversaries as well as the theorems on relativized reductions 
still apply. 

More pictorially, one can imagine parametrized black-box reductions in light 
of our Figure [T] as descending from the *B* plane for black-box adversaries 
towards the *N* plane, where the reduction can completely depend on the ad- 
versary, see Figure EO The parameters and the distinction between awareness 
and dependency determines how far one descends. Analogously, parametriza- 
tion for BBB-reductions means to descend from the top node BBB to BNB 
(also in the case of efficient adversaries). As such, it is clear that implications 
along edge paths remain valid, e.g., a parametrized NBN-reduction still implies 
a NNN-reduction. 

The case of NBB-reductions, however, shows that parametrization cannot 
fully bridge the gap to NNB-r eductions. As explained before, the latter type 
with quantification \/A3S\/f3G does not seem to be meaningful, because the 
construction G would now depend on the adversary A. Parametrization of NBB- 
reductions (with quantification 35V/3GVA) still makes sense, though, because 
the dependency of S on the adversary is only through the running time or 
the input. Put differently, the parametrization allows for the “admissible non- 
black-boxness” for the NBB type of reduction. If one parametrizes the black-box 
access to the primitive, either for the construction or the reduction, then this 
parametrization corresponds to a (partial) shift from back plane to the front 
plane resp. from the top *BB plane to the *BN plane. In the full version of this 
paper [l|, we establish formal relationships between parameter-awareness and 
parameter-depedency. 
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4 Conclusion 

We provide a comprehensive framework to classify black-box reductions more 
precisely. We believe that this is important to fully understand and appreciate 
the implications and limitations of black-box separation results. In particular, 
we point out how subtleties such as different possibilities to define a primitive, 
the distinction between efficient and non-efficient adversaries and primitives, 
or parameterization, affect the results. Such details have previously been often 
neglected, and our work draws more attention to these issues. 
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Abstract. We present a unified approach for obtaining general secure compu- 
tation that achieves adaptive-Universally Composable (UC)-security. Using our 
approach we essentially obtain all previous results on adaptive concurrent secure 
computation, both in relaxed models (e.g., quasi-polynomial time simulation), as 
well as trusted setup models (e.g., the CRS model, the imperfect CRS model). 
This provides conceptual simplicity and insight into what is required for adaptive 
and concurrent security, as well as yielding improvements to set-up assumptions 
and/or computational assumptions in known models. Additionally, we provide the 
first constructions of concurrent secure computation protocols that are adaptively 
secure in the timing model, and the non-uniform simulation model. As a corollary 
we also obtain the first adaptively secure multiparty computation protocol in the 
plain model that is secure under bounded-concurrency. 

Conceptually, our approach can be viewed as an adaptive analogue to the re- 
cent work of Lin, Pass and Venkitasubramaniam [STOC ‘09], who considered 
only non-adaptive adversaries. Their main insight was that the non-malleability 
requirement could be decoupled from the simulation requirement to achieve UC- 
security. A main conceptual contribution of this work is, quite surprisingly, that 
it is still the case even when considering adaptive security. 

A key element in our construction is a commitment scheme that satisfies a 
strong definition of non-malleability. Our new primitive of concurrent equivocal 
non-malleable commitments , intuitively, guarantees that even when a man-in-the- 
middle adversary observes concurrent equivocal commitments and decommit- 
ments, the binding property of the commitments continues to hold for 
commitments made by the adversary. This definition is stronger than previous 
ones, and may be of independent interest. Previous constructions that satisfy our 
definition have been constructed in setup models, but either require existence 
of stronger encryption schemes such as CCA-secure encryption or require in- 
dependent “trapdoors” provided by the setup for every pair of parties to ensure 
non-malleability. A main technical contribution of this work is to provide a con- 
struction that eliminates these requirements and requires only a single trapdoor. 
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1 Introduction 

The notion of secure multi-party computation allows mutually distrustful parties to se- 
curely compute a function on their inputs, such that only the (correct) output is obtained, 
and no other information is leaked, even if the adversary controls an arbitrary subset of 
parties. This security is formalized via the real/ideal simulation paradigm, requiring that 
whatever the adversary can do in a real execution of the protocol, can be simulated by 
an adversary (“simulator”) working in the ideal model, where the parties submit their 
inputs to a trusted party who then computes and hands back the output. Properly for- 
malizing this intuitive definition and providing protocols to realize it requires care, and 
has been the subject of a long line of research starting in the 1980s. 

In what is recognized as one of the major breakthroughs in cryptography, strong 
feasibility results were provided, essentially showing that any function that can be ef- 
ficiently computed, can be efficiently computed securely, assuming the existence of en- 
hanced trapdoor permutations (eTDP) 114612711 . However, these results were originally 
investigated in the stand-alone setting, where a single instance of the protocol is run 
in isolation. A stronger notion is that of concurrent security, which guarantees secu- 
rity even when many different protocol executions are carried out concurrently. In this 
work, we focus on the strongest (and most widely used) notion of concurrent security, 
namely universally-composable (UC) security 0. This notion guarantees security even 
when an unbounded number of different protocol executions are run concurrently in an 
arbitrary interleaving schedule and is critical for maintaining security in an uncontrolled 
environment that allows concurrent executions (e.g., the Internet). Moreover, this no- 
tion also facilitates modular design and analysis of protocols, by allowing the design 
and security analysis of small protocol components, which may then be composed to 
obtain a secure protocol for a complex functionality. 

Unfortunately, achieving these strong notions of concurrent security is far more chal- 
lenging than achieving stand-alone security, and we do not have general feasibility re- 
sults for concurrently secure computation of every function. In fact, there are lower 
bounds showing that concurrent security (which is implied by UC security) cannot be 
achieved for general functions, unless trusted setup is assumed I8I9I35H . Previous works 
overcome this barrier either by using some trusted setup infrastructure 1811 1I2I7I30I12I - 
or by relaxing the definition of security H39I45I3I10I251 (we will see examples below). 

Another aspect of defining secure computation, is the power given to the adversary. 
A static (or non-adaptive) adversary is one who has to decide which parties to cor- 
rupt at the outset, before the execution of the protocol begins. A stronger notion is 
one that allows for an adaptive adversary, who may corrupt parties at any time, based 
on its current view of the protocol. It turns out that achieving security in the adaptive 
setting is much more challenging than in the static one. The intuitive reason for this 
is that the simulator needs to simulate messages from uncorrupted parties, but may 
later need to explain the messages (i.e. produce the randomness used to generate those 
messages) when that party is corrupted. Moreover, the simulator must simulate mes- 
sages from uncorrupted parties without knowing their inputs, but when corrupted, must 
explain the messages according to the actual input that the party holds. On the other 
hand, in the real protocol execution, messages must information-theoretically determine 
the actual inputs of the party, both for correctness as well as to ensure that an adver- 
sary is committed to its inputs and cannot cheat. We note that although the setting of 
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adaptive corruptions with erasures has been considered in the literature, in our work we 
assume adaptive corruptions without erasures. Here we assume that honest parties can- 
not reliably erase randomness used to generate messages of the protocol and thus when 
corrupted, the adversary learns the randomness used by that party to generate previous 
protocol messages. Clearly, this is the more general and challenging setting. Canetti, 
Lindell, Ostrovsky and Sahai hd provided the first constructions of UC-secure pro- 
tocols with static and adaptive security in the common reference string model (CRSfl 
Subsequently, several results were obtained for both the static and adaptive case in 
other trusted-setup models and relaxed-security models. The techniques for achieving 
security against adaptive adversaries are generally quite different than the techniques 
needed to achieve security against static adversaries, and many results for concurrent 
secure computation do not readily extend to the adaptive setting. In fact, several of the 
previous results allowing general concurrent secure computation (e.g., using a trusted 
setup) were only proved for the static case B33I34I42I40I22I30B . and extending them to 
the adaptive setting has remained an open problem. 

In this paper we focus on the strongest notions of security, and study their fundamen- 
tal power and limitations. The main question we ask is: 

Under which circumstances is adaptive concurrent security generally feasible? 

In particular, we refine this question to ask: 

What is the minimum setup required to achieve adaptive concurrent security? 

We address these questions on both a conceptual and technical level. In particular, 
we unify and generalize essentially all previous results in the generic adaptive concur- 
rent setting, as well as providing completely new results (constructions with weaker 
trusted setup requirements, weaker computational assumptions, or in relaxed models 
of security), conceptual simplicity, and insight into what is required for adaptive and 
concurrent secure computation. Our main technical tool is a new primitive of equivocal 
non-malleable commitment. We describe our results in more detail below. 


1.1 Our Results 

We extend the general framework of El, to obtain a composition theorem that allows 
us to establish adaptive UC-security in models both with, and without, trusted set-up. 
With this theorem, essentially all general UC-feasibility results for adaptive adversaries 
follow as simple corollaries, often improving the set-up and/or complexity theoretic as- 
sumptions; moreover, we obtain adaptive UC secure computation in new models (such 
as the timing model). Additionally, our work is the first to achieve bounded-concurrent 
adaptively-secure multiparty computation without setup assumptions. As such, similar 
to El, our theorem takes a step towards characterizing those models in which adaptive 
UC security is realizable, and also at what cost. 

Although technically quite different, as mentioned previously, our theorem can be 
viewed as an adaptive analogue of the work of Lin, Pass and Venkitasubramaniam 

1 In the CRS model, all parties have access to public reference string sampled from a pre- 
specified distribution. 
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ea, who study the static case. Their work puts forward the very general notion of 
a “UC-puzzle” to capture the models (or setup assumptions) that admit general static 
UC-security. More precisely, they prove that if we assume the existence of enhanced 
trapdoor permutations and stand-alone non-malleable commitments, static UC-security 
is achievable in any model that admits a UC-puzzle. In this work, we establish an anal- 
ogous result for the more difficult case of adaptive UC-security, as we outline below. 

We start by introducing the notion of an Adaptive UC-Puzzle. Next, we define the 
new primitive (which may be of independent interest), equivocal non-malleable com- 
mitment or EQNMCom, which is a commitment with the property that a man-in-the- 
middle observing concurrent equivocal commitments and decommitments cannot break 
the binding property. We then present a construction of equivocal non-malleable com- 
mitment for any model that admits an adaptive UC-puzzle (thus, requiring this primitive 
does not introduce an additional complexity-theoretic assumption). Finally, we rely on 
a computational assumption that is known to imply adaptively secure OT (analogous to 
the eTDP used by E3, which implies statically secure OT). Specifically, we use simu- 
latable public key encryption Ena. Although a weaker assumption, trapdoor simu- 
latable public key encryption is known to imply semi-honest adaptively secure OT, it is 
unknown how to achieve malicious, adaptive, UC secure OT (in any setup model) from 
only trapdoor simulatable public key encryption. We remark here that, more recently, 
for the static case, Lin et al. show how to extend their framework and rely on the min- 
imal assumptions of stand-alone semi-honest oblivious-transfer and static UC-puzzle 
HU. More concretely, we show the following: 

Theorem 1 (Main Theorem (Informal)). Assume the existence of an adaptive UC- 
secure puzzle £ using some setup T, the existence of an EQNMCom primitive, and 
the existence of a simulatable public-key encryption scheme. Then, for every m-ary 
functionality f, there exists a protocol II using the same set-up T that adaptively, UC- 
realizes f. 

As an immediate corollary of our theorem, it follows that to establish feasibility of 
adaptive UC-secure computation in any set-up model, it suffices to construct an adap- 
tive UC-puzzle in that model. Complementing the main theorem, we show that in many 
previously studied models, adaptive UC-puzzles are easy to construct. Indeed, in many 
models the straightforward puzzle constructions for the static case (cf. E2) are suf- 
ficient to obtain adaptive puzzles; some models require puzzle constructions that are 
more complex (see the full version H3 for details). We highlight some results below. 

Adaptive UC in the “imperfect” String Model. Canetti, Pass and shelat lITZIl consider 
adaptive UC security where parties have access to an “imperfect” reference string- 
called a “sunspot”-that is generated by an arbitrary efficient min-entropy source (ob- 
tained e.g., by measurement of some physical phenomenon). The CPS-protocol requires 
m communicating parties to share m reference strings, each of them generated using 
fresh entropy. We show that a single reference string is sufficient for UC and adaptively- 
secure MPC (regardless of the number of parties m). 

Adaptive UC in the Timing Model. Dwork, Naor and Sahai |22l introduced the timing 
model in the context of concurrent zero-knowledge, where all players are assumed to 
have access to clocks with a certain drift. Kalai, Lindell and Prabhakaran 113011 sub- 
sequently presented a concurrent secure computation protocol in the timing model; 
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whereas the timing model of ll22l does not impose a maximal upper-bound on the clock 
drift, the protocol of lf30l requires the clock-drift to be “small”; furthermore, it requires 
extensive use of delays (roughly nA, where A is the latency of the network). Finally, 
El showed that UC security against static adversaries is possible also in the unre- 
stricted timing model (where the clock drift can be “large”); additionally, they reduce 
the use of delays to only 0{A). To the best of our knowledge, our work is the first to 
consider security against adaptive adversaries in the timing model, giving the first fea- 
sibility results for UC and adaptively-secure MPC in the timing model; moreover, our 
results also hold in the unrestricted timing model. 

Adaptive UC with Quasi-polynomial Simulation. Pass |[39l proposed a relaxation of 
the standard simulation-based definition of security, allowing for super polynomial-time 
or Quasi-polynomial simulation (QPS). In the static and adaptive setting, Prabhakaran 
and Sahai H5I and Barak and Sahai 0 obtained general MPC protocols that are con- 
currently QPS-secure without any trusted set-up, but rely on strong complexity assump- 
tions. We achieve adaptive security in the QPS model under relatively weak complexity 
assumptions. Moreover, we achieve a stronger notion of security, which (in analogy 
with 1391 ) requires that indistinguishability of simulated and real executions holds for 
all of quasi-polynomial time; in contrast, 0 only achieves indistinguishability w.r.t. 
distinguishers with running-time smaller than that of the simulator. 

Adaptive UC with Non-uniform Simulation. Lin et al. lf33ll introduced the non-uniform 
UC model, which considers environments that are WT machines and ideal-model ad- 
versaries that are non-uniform WT machines and prove feasibility of MPC in the same 
model. Relying on the same assumptions as those introduced by El to construct a puzzle 
in non-uniform model (along with the assumption of the existence of simulatable PKE), 
we show feasibility results for secure MPC in the adaptive, non-uniform UC model. 

Adaptive Bounded-Concurrent Secure Multiparty Computation. Several works 
1341421401 consider the notion of bounded-concurrency for general functionalities where 
a single secure protocol II implementing a functionality / is run concurrently, and 
there is an a priori bound on the number of concurrent executions. In our work, we 
show how to construct an adaptive puzzle in the bounded-concurrent setting (with no 
setup assumptions). Thus, we achieve the first results showing feasibility of bounded- 
concurrency of general functionalities under adaptive corruptions. 

In addition to these models, we obtain feasibility of adaptive UC in existing models 
such as the common reference string (CRS) model m , uniform reference string (URS) 
model ifTTl . key registration model 0 , tamper-proof hardware model El , and partially 
isolated adversaries model lf20ll (see the full version H3). For relaxed security models, 
we obtain UC in the quasi-polynomial time model 139145131 . 

Beyond the specific instantiations, our framework provides conceptual simplicity, 
technical insight, and the potential to facilitate “translation” of results in the static set- 
ting into corresponding (and much stronger) adaptive security results. For example, 
in recent work of Garg et al. El, one of the results — constructing UC protocols us- 
ing multiple setups when the parties share an arbitrary belief about the setups — can be 
translated to the adaptive model by replacing (static) puzzles with our notion of adap- 
tive puzzles. Other results may require more work to prove, but again are facilitated by 
our framework. 
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1.2 Technical Approach and Comparison with Previous Work 

There are two basic properties that must be satisfied in order to achieve adaptive UC 
secure computation: (1) concurrent simulation and (2) concurrent non-malleability. The 
former requirement amounts to providing the simulator with a trapdoor while the latter 
requirement amounts to establishing independence of executions. The simulation part 
is usually “easy” to achieve. Consider, for instance, the common random string (CRS) 
or Uniform Reference String (URS) model where the players have access to a public 
reference string that is sampled uniformly at random. A trapdoor can be easily provided 
to the simulator as the inverse of the reference string under a pseudo-random generator. 
Concurrent non-malleability on the other hand is significantly harder to achieve. For the 
specific case of the CRS model, Canetti et al. m and subsequent works I123I371I show 
that adaptive security can be achieved using a single trapdoor. However, more general 
setup models require either strong computational assumptions, or provide the simulator 
with different and independent trapdoors for different executions. For example, in the 
URS model, ifTTl interpret the random string as a public-key for a CCA-secure encryp- 
tion scheme, and need to assume dense cryptosystems, while in the imperfect random 
string (sunspot) model, m require multiple trapdoors. Other models follow a similar 
pattern, where concurrent non-malleability is difficult. 

In the static case, lf33l provided a framework that allowed to decouple the concur- 
rent simulation requirement from the concurrent non-malleability. More precisely, they 
show that providing a (single) trapdoor to achieve concurrent simulation is sufficient, 
and once a trapdoor is established concurrent non-malleability can be obtained for free. 
This allows them to obtain significant improvement in computational/set-up assump- 
tions since no additional assumptions are required to establish non-malleability. 

A fundamental question is whether the requirement of concurrent simulation and 
concurrent non-malleability can be decoupled in the case of adaptive UC-security. Un- 
fortunately, the techniques used in the static case are not applicable in the adaptive 
case. Let us explain the intuition. l33l and subsequent works rely on stand-alone non- 
malleable primitives to achieve concurrent non-malleability. An important reason this 
was possible in the static case is because non-malleable primitives can be constructed 
in the plain-model (i.e. assuming no trapdoor). Furthermore, these primitives inher- 
ently admit black-box simulation, i.e. involve the simulator rewinding the adversary. 
Unfortunately, in the adaptive case both these properties are difficult to achieve. First, 
primitives cannot be constructed in the plain model since adaptive security requires the 
simulator to be able to simultaneously equivocate the simulated messages for honest 
parties for different inputs and demostrate their validity at any point in the execution by 
revealing the random coins for the honest parties consistent with the messages. Second, 
as demostrated in 1251 - black-box rewinding techniques cannot be employed since the 
adversary can, in between messages, corrupt an arbitrary subset of the players (some 
not even participating in the execution) whose inputs are not available to the simulator. 

In this work, we show, somewhat surprisingly that a single trapdoor is still sufficient 
to achieve concurrent non-malleability. Although we do not decouple the requirements, 
this establishes that even for the case of adaptive security no additional setup, and there- 
fore, no additional assumptions, are required to achieve concurrent non-malleability, 
thereby yielding similar improvements to complexity and set-up assumptions to m. 
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The basic approach we take resembles closely the unified framework of (31. By 
relying on previous works H40I42I351 111271 . Lin et. al in El argue that to construct 
a UC protocol for realizing any multi-party functionality, it suffices to construct a 
zero-knowledge protocol that is concurrently simulatable and concurrently simulation- 
souncfl To formalize concurrent-simulation, they introduce the notion of a UC-puzzle 
that captures the property that no adversary can successfully complete the puzzle and 
also obtain a trapdoor, but a simulator exists that can generate (correctly distributed) 
puzzles together with trapdoors. To achieve simulation-soundness, they introduce the 
notion of strong non-malleable witness indistinguishability and show how a protocol 
satisfying this notion can be based on stand-alone non-malleable commitments. 

A first approach for the adaptive case, would be to extend the techniques from El, 
by replacing the individual components with analogues that are adaptively secure and 
rely on a similar composition theorem. While the notion of UC-puzzle can be strength- 
ened to the adaptive setting, the composition theorem does not hold for stand-alone non- 
malleable commitments. This is because, in the static case, it is enough to consider a 
commitment scheme that is statistically-binding for which an indistinguishability-based 
notion of non-malleability is sufficient; such a notion, when defined properly, is concur- 
rently composable. However, when we consider adaptive security, commitments need 
to be equivocable (i.e., the simulator must be capable of producing a fake commitment 
transcript and inputs for honest committers that allow the transcript to be decommitted 
to both 0 and 1) and such commitments cannot be statistically-binding. Therefore, we 
need to consider a stronger simulation-based notion of non-malleability. Furthermore, 
as mentioned before, an equivocal commitment, even in the stand-alone case, requires 
the simulator to have a trapdoor, which in turn requires some sort of a trusted set-up. 

Our approach here is to consider a “strong” commitment scheme, one that is both 
equivocable and concurrently non-malleable at the same time, but relies on a UC-puzzle 
(i.e. single trapdoor) and then establish a new composition theorem that essentially es- 
tablishes feasibility of UC-secure protocol in any setup that admits a UC-puzzle. While 
the core contribution of m was in identifying the right notion of UC-puzzle and pro- 
viding a modular analysis, in this work, the main technical novelty is in identifying 
the right notion of commitment that will allow feasibility with a single trapdoor. Once 
this is established the results from ll33l can be extended analogously by constructing 
an adaptively secure UC-puzzle for each model. In fact, in most of the models consid- 
ered in this work, the puzzle constructions are essentially the same as the static case 
and thus we obtain similar corollaries to l33l . While the general framework for our 
work resembles E2L as we explain in the next section, the commitment scheme and the 
composition theorem are quite different and requires an intricate and subtle analysis. 

1.3 Main Tool: Equivocal Non-malleable Commitments 

We define and construct a new primitive called equivocal non-malleable commitments 
or EQN M Com. Such commitments have previously been defined in the works of 11151161 
but only for the limited case of bounded concurrency and non-interactive commitments. 
In our setting, we consider the more general case of unbounded concurrency as well as 

2 Simulation-soundness is a stronger property that implies and is closely related to non- 
malleability. 
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interactive commitments. Intuitively, the property we require from these commitments 
is that even when a man-in-the-middle receives concurrent equivocal commitments and 
concurrent equivocal decommitments, the man-in-the-middle cannot break the binding 
property of the commitment. Thus, the man-in-the-middle receives equivocal commit- 
ments and decommitments, but cannot equivocate himself. Formalizing this notions 
seems to be tricky and has not been considered in literature before. Previously, non- 
malleability of commitments has been dealt with in two scenarios: 

Non-malleabilitv w.r.t commitment: 1211431321 This requires that no adversary that 
receives a commitment to value v be able to commit to a related value (even without 

being able to later decommit to this value). 

Non-malleability w.r.t decommitment (or onening): ll5l43ll9ll This requires that no 
adversary that receives a commitment and decommitment to a value v be able to 
commit and decommit to a related value. 

While the former is applicable only in the case the of statistically-binding commit- 
ments the latter is useful even for statistically-hiding commitments. In this work, we 
need a definition that ensures independence of commitments schemes that additionally 
are equivocable and adaptively secure. Equivocability means that there is a way to commit 
to the protocol without knowing the value being committed to and later open to any value. 
Such a scheme cannot be statistically-binding. Furthermore, since we consider the setting 
where the adversary receives concurrent equivocal decommitments, our definition needs 
to consider non-malleability w.r.t decommitment. Unfortunately, current definitions for 
non-malleability w.r.t decommitment in literature are defined only in the scenario where 
the commitment phase and decommitment phases are decoupled, i.e. in a first phase, a 
man-in-the-middle adversary receives commitments and sends commitments, then, in a 
second phase, the adversary requests decommitments of the commitments received in the 
first phase, followed by it decommitting its own commitments. For our construction, we 
need to define concurrent non-malleability w.r.t decommitments and such a two phase 
scenario is not applicable as the adversary can arbitrarily and adaptively decide when to 
obtain decommitments. Furthermore, it is not clear how to extend the traditional defini- 
tion to the general case, as at any point, only a subset of the commitments received by the 
adversary could be decommitted and the adversary could selectively decommit based on 
the values seen so far and hence it is hard to define a “related” value. 

We instead propose a new definition, along the lines of simulation-extractability that 
has been defined in the context of constructing non-malleable zero-knowledge proofs 
@4). Loosely speaking, an interactive protocol is said to be simulation extractable if for 
any man-in-the-middle adversary A, there exists a probabilistic polynomial time machine 
(called the simulator-extractor) that can simulate both the left and the right interaction for 
A, while outputting a witness for the statement proved by the adversary in the right inter- 
action. Roughly speaking, we say that a tag-based commitment scheme (i.e., commitment 
scheme that takes an identifier — called the tag — as an additional input) is concurrent non- 
malleable w.r.t opening if for every man-in-the-middle adversary A that participates in 
several interactions with honest committers as a receiver (called left interactions) as well 
as several interactions with honest receivers as a committer (called right interactions), 
there exists a simulator S that can simulate the left interactions, while extracting the com- 
mitments made by the adversary in the right interactions (whose identifiers are different 
from all the left identifiers) before the adversary decommits. 
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A related definition in literature is that of simulation-sound trapdoor commitments 
from E5I57I which considers non-interactive equivocable commitments and require that 
no adversary be able to equivocate even when it has access to an oracle that provides 
equivocal commitments and decommitments. This can be thought of as the CCA ana- 
logue for equivocal commitments. We believe that such a scheme would suffice for our 
construction, however, it is not clear how to construct such commitments from any trap- 
door (i.e. any set-up) even if we relax the definition to consider interactive commitments. 

It is not hard to construct equivocal commitments using trusted set-up. The idea here 
is to provide the simulator with a trapdoor with which it can equivocate as wells as ex- 
tract the commitments on the right, (by e.g., relying on encryption). However, to ensure 
non-malleability, most constructions in literature additionally impose CCA-security or 
provide independent trapdoors for every interaction. Our main technical contribution 
consists of showing how to construct a concurrent non-malleable commitment scheme 
in any trusted set-up by providing the simulator with just one trapdoor, i.e. we show how 
to construct a concurrent non-malleable commitment scheme w.r.t opening using any 
UC-puzzle. We remark here that, in the static case, a stand-alone non-malleable com- 
mitment was sufficient, since the indistinguishability based notion of non-malleability 
allowed for some form of concurrent composition. However, in the adaptive case, it is 
not clear if our definition yields a similar composition and hence we construct a scheme 
and prove non-malleability directly in the concurrent setting. 

Although our main application of equivocal non-malleable commitments is achiev- 
ing UC-security, these commitments may also be useful for other applications such as 
concurrent non-malleable zero knowledge secure under adaptive corruptions. We be- 
lieve that an interesting open question is to explore other applications of equivocal non- 
malleable commitments and non-malleable commitments with respect to 
decommitment. 

2 Equivocal Non-malleable Commitments 

In this section, we define Equivocal Non-malleable Commitments. Intuitively, these are 
equivocal commitments such that even when a man-in-the-middle adversary receives 
equivocal commitments and openings from a simulator, the adversary himself remains 
unable to equivocate. Since we are interested in constructing equivocal commitments 
from any trapdoor (i.e. setup), we will capture trapdoors, more generally, as witnesses 
to N P-statements. First, we provide definitions of language-based commitments. 

Language-Based Commitment Schemes: We adopt a variant of language-based com- 
mitment schemes introduced by Lindell et. al ll36l which in turn is a variant of 1412911 . 
Roughly speaking, in such commitments the sender and receiver share a common in- 
put, a statement x from an NP language L. The properties of the commitment scheme 
depend on the whether x is in L or not and the binding property of the scheme asserts 
that any adversary violating the binding can be used to extract an N P-witness for the 
statement. We present the formal definition below. 

Definition 1 (Language-Based Commitment Schemes). Let L be an HP -Language 
and 1Z, the associated NP -relation. A language-based commitment scheme (LBCS)for 
L is commitment scheme (S, R) such that: 
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Computational hiding: For every ( expected ) PPT machine R*, it holds that, the fol- 
lowing ensembles are computationally indistinguishable over n £ N. 

- {sta^j ^ (x, V2, • 2 ; )}neJV,xe{o,i}",vi,i; 2 e{o,i} n ,ze{o,i}* 

where stafg ( x , v, z) denotes the random variable describing the output of 

R* (x, z) after receiving a commitment to v using (S, R). 

Computational binding: The binding property asserts that, there exists an polynomial- 
time witness-extractor algorithm Ext, such that for any cheating committer S*, that 
can decommit a commitment to two different values Vi , v-2 on common input x £ 
{0, 1}" outputs w such that w £ 1Z(x). 

We now extend the definition to include equivocability. 

Definition 2 (Language-Based Equivocal Commitments). Let L be an N P -Language 
and R, the associated N P- relation. A language-based commitment scheme (S, R) for 
L is said to be equivocal, if there exists a tuple of algorithms (S', Adap) such that the 
following holds: 

Special-Hiding: For every ( expected ) PPT machine R*, it holds that, the following 
ensembles are computationally indistinguishable over n £ N. 

- ' 2 )}neJV,x6in{o,i}",«ieK(x),vie{o,t}”,ze{o,i}* 

where sta (x, w, z ) denotes the random variable describing the output of 

R*(x, z) after receiving a commitment using ( S , R). 

Equivocability: Let r be the transcript of the interaction between R and S on common 
input leLfljO, l} n and private input w £ TZ(x) and random tape r £ {0, 1}* 
for S. Then for any v £ {0, l} n , Adap (x,w,r,r,v) produces a random tape r' 
such that ( r v) serves as a valid decommitment for C on transcript r. 

2.1 Definition of Equivocal Non-malleable Commitments 

Let (C, R) be a commitment scheme, and let n £ N be a security parameter. Con- 
sider man-in-the-middle adversaries that are participating in left and right interactions 
in which m = poly(n) commitments take plac^E We compare between a man-in-the- 
middle and a simulated execution. In the man-in-the-middle execution, the adversary A 
is simultaneously participating in m left and right interactions. In the left interactions 
the man-in-the-middle adversary A interacts with C receiving commitments to values 
v\, . . . , v rn , using identities idi, . . . , id m of its choice. It must be noted here that values 
Vi, . . . , v m are provided to committer on the left prior to the interaction. In the right in- 
teraction A interacts with R attempting to commit to a sequence of related values again 

3 We may also consider relaxed notions of concurrent non-malleability: one-many, many-one 
and one-one secure non-malleable commitments. In a one-one (i.e., a stand-alone secure) non- 
malleable commitment, we consider only adversaries A that participate in one left and one 
right interaction; in one-many, A participates in one left and many right, and in many-one, A 
participates in many left and one right. 
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using identities of its choice idi, . . . , id m ; % is set to the value decommitted by A in the 
j th right interaction. If any of the right commitments are invalid its committed value is 
set to _L. For any i such that id,; = idj for some j, set v t = _L — i.e., any commitment 
where the adversary uses the same identity as one of the honest committers is consid- 
ered invalid. Let MIM^ ^(ui, . . . , v rn , z) denote a random variable that describes the 
values vi, n m and the view of A, in the above experiment. 

In the simulated execution, a simulator S interacts only with receivers on the right as 
follows: 

1 . Whenever the commitment phase of j th interaction with a receiver on the right is 
completed, S outputs a value Vj as the alleged committed value in a special-output 
tape. 

2. During the interaction, S may output a partial view for a man-in-the-middle ad- 
versary whose right-interactions are identical to S’s interaction so far. If the view 
contains a left interaction where the i th commitment phase is completed and the 
decommitment is requested, then a value v % is provided as the decommitment. 

3. Finally, S outputs a view and values v \, . . . , v rn . Let sim® c ^(1", v%, . . . , v rn , z) 
denote the random variable describing the view output by the simulation and values 
% 

Definition 3. A commitment scheme ( C,R ) is said to be concurrent non-malleable 
w.r.t. opening if for every polynomial p(-), and every probabilistic polynomial-time 
man-in-the-middle adversary A that participates in at most m = p(n) concurrent exe- 
cutions, there exists a probabilistic polynomial time simulator S such that the following 
ensembles are computationally indistinguishable over n € N: 

l f> ) neN,v 1 ,...,v m e{o,i} n ,ze{o,i}* 

{ffcwtt*,.! 

A slightly relaxed definition considers all the values committed to the adversary in the 
left interaction to be sampled independently from an arbitrary distribution D. We show 
how to construct a commitment satisfying only this weaker definition. However, this 
will be sufficient to establish our results. 

Definition^ A commitment scheme ( C,R ) is said to be concurrent non-malleable 
w.r.t. opening with independent and identically distributed (i.i.d.) commitments if for 
every polynomial p(- ) and polynomial time samplable distribution D, and every prob- 
abilistic polynomial-time man-in-the-middle adversary A that participates in at most 
m = pin) concurrent executions, there exists a probabilistic polynomial time simu- 
lator S such that the following ensembles are computationally indistinguishable over 
neN: 

,„ m ) <- m 

» m ) <- D" : simf GK1 (1“, *) }„ e „_, e{01} . 
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Remark 1. Any scheme that satisfies our definition with a straight-line simulator in 
essence realizes the ideal commitment functionality with UC-security as it acheives 
equivocation and straight-line extraction. If the simulator is not straight-line, then the 
requirement that the left commitments are sampled from i.i.d distributions is seem- 
ingly inherent. This is because our definition implies security against selective open- 
ning attacks (SOA) and as proved in El, achieving fully concurrent SOA-security 
with (black-box) rewinding techniques is impossible when the distributions of the com- 
mitments are not sampleable (or unknown). 

Finally, we consider commitment schemes that are both non-malleable w.r.t opening 
and language-based equivocal. In a setup model, the simulator will obtain a trapdoor 
via the setup procedure and the witness relation will satisfy that language requirement. 

Definition 5. A commitment scheme { C , R) is said to be an equivocal non-malleable 
commitment scheme if it is both a language-based equivocal commitment scheme (see 
Definition \2j) and is concurrent non-malleable w.r.t. opening (see Definition [5]). 

3 Adaptive UC-Puzzles 

Informally, an adaptive UC-puzzle is a protocol (S, R) between two players-a sender 
S and a receiver R - and a PPT computable relation 1Z, such that the following two 
properties hold: 

Soundness: No efficient receiver R* can successfully complete an interaction with S 
and also obtain a “trapdoor” y, such that 7£(TRANS, y) = 1, where TRANS is the 
transcript of the interaction. 

Statistical UC-simulation with adaptive corruptions: For every efficient adversary 
A participating in a polynomial number of concurrent executions with receivers R 
(i.e., A is acting as a puzzle sender in all these executions) and at the same time 
communicating with an environment Z, there exists a simulator S that is able to 
statistically simulate the view of A for Z, while at the same time outputting trap- 
doors to all successfully completed puzzles. Moreover, S successfully simulates 
the view even when A may adaptively corrupt the receivers. 

We provide a formal definition in the full version m . In essence, it is the same def- 
inition as in Il33l with the additional requirement of adaptive security in the simulation. 
We remark that our analysis will require the puzzle to be straight-line simulatable. In 
fact, for almost all models considered in this work, this is indeed the case, with the ex- 
ception of the timing and partially-isolated adversaries model (for which we argue the 
result independently). Using the result of ESI, it is possible to argue that straight-line 
simulation is necessary to achieve adaptive security (except when we consider restricted 
adversaries, such as the timing or partially-isolated adversaries model). 

4 Achieving Adaptive UC-Security 

In this section, we give a high-level overview of the construction of an EQNMCom 
scheme and the proof of Theorem[l] which relies on the existence of an EQNMCom 
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scheme. For the formal construction and analysis of our EQNMCom scheme, see the 
full version IfTTl . A formal proof of TheoremQ]can be found in the full version 11771 . 

By relying on previous results HI 1I18I28I14I13I . the construction of an adaptive UC- 
secure protocol for realizing any multiparty functionality can be reduced to the task of 
constructing a UC-commitment assuming the existence of simulatable PKE. First, we 
show how to construct an equivocal non-malleable commitment scheme based on any 
adaptive UC-puzzle. Then combining the equivocal non-malleable commitment scheme 
with a simulatable PKE scheme we show how to realize the UC-commitment. 

4.1 Constructing EQNMCom Based on Adaptive UC-Puzzles 

Our protocol on a very high-level is a variant of the non-malleable commitment protocol 
from El which in turn is a variant of the protocol from ED- While non-malleability 
relies on the message-scheduling technique of 1211321 protocol, the equivocability is 
obtained by relying on a variant of Feige-Shamir’s trapdoor commitment schem^ and 
adaptively secure witness-indistinguishable proof of knowledge (WIPOK) protocol (see 
the full version El) for a formal definition and construction) of Lindell-Zarosim ll36l . 
More precisely, our protocol proceeds in two phases: in the preamble phase, the Com- 
mitter and Receiver exchange a UC-puzzle where the Receiver is the sender of the 
puzzle and the Committer is the receiver of the puzzle (this phase establishes a trap- 
door through which an equivocal commitment can be generated). This is followed by 
the commitment phase: here the Committer first commits to its value using a language- 
based (non-interactive) equivocal commitment scheme, where the N P-language is the 
one corresponding to the UC-puzzle and the particular statement is the puzzle ex- 
changed in the preamble (this relies on the Feige-Shamir trapdoor commitment scheme). 
This is followed by several invocations of an (adaptively-secure) WIPOK where the 
Committer proves the statement that either it knows the value committed to in phase 
2 or possesses a solution to the puzzle from phase 1. Here we rely on the adaptively- 
secure (without erasures) WIPOK of E3 where the messages are scheduled based on 
the Committers id using the scheduling of ED. This phase allows for any Committer 
that possess a solution to the puzzle from the preamble phase to generate a commitment 
that can be equivocated (i.e. later be opened to any value). Conversely, any adversary 
that can equivocate the non-interactive commitment of the second phase can be used to 
obtain a solution to the puzzle. The decommitment information is simply the value and 
the random tape of an honest committer consistent with the commitment phase. More 
specifically, our protocol proceeds as follows: 

1 . In the Preamble Phase, the Committer and Receiver exchange a UC-puzzle where 
the Receiver is the sender of the puzzle and the Committer is the receiver of the 
puzzle. Let x be the transcript of the interaction. 

2. In the Committing Phase, the Committer sends c = EQCom x (u), where EQCom x 
is a language-based equivocal commitment scheme as in Definition[2]with common 
input x. This is followed by the Committer proving that c is a valid commitment 

4 Let x be an N P-statement. The sender commits to bit b by running the honest- verifier sim- 
ulator for Blum’s Hamiltonian Circuit protocol 0 on input the statement x and the verifier 
message 6, generating the transcript (a, b, z), and finally outputting a as its commitment. In 
the decommitment phase, the sender reveals the bit b by providing both b, z. 
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for v. This is proved by 4f invocations of an adaptively-secure (without erasures) 
WIPOK where the messages are scheduled based on the id (as in 1211321 1. More 
precisely, there are £ rounds, where in round i, the schedule design^ is followed by 
designi-id, (See FigureQ}- 


7l i a 2 

^2 


design 1 


Fig. 1 . Message schedule in a round in adaptively-secure WIPOK 

While the protocol is an adaptation of the If32l commitment scheme, where the in- 
dividual components are replaced by adaptively-secure alternatives, proving security 
requires a substantially different analysis. It is easy to see that concurrent equivocabil- 
ity of our scheme follows from the UC-Puzzle simulation. However proving concurrent 
non-malleability w.r.t opening with i.i.d commitments is the hard part and the core of 
our contribution. Recall that, achieving this, essentially entails constructing a simulator 
for any man-in-the-middle adversary, that while equivocating all commitments to the 
adversary (in the left interactions), can extract all the values the value committed to by 
the adversary (in the right interactions) before the decommitment phase. 

Towards extracting from the right interactions, we first recall the basic idea in 132I2T1 . 
Their scheduling ensures that for every right interaction with a tag that is different from 
a left interaction, there exists a point — called a safe-point — from which we can rewind 
the right interaction (and extract the committed value), without violating the hiding 
property of the left interaction. It now follows from the hiding property of the left inter- 
action that the values committed to on the right do not depend on the value committed 
to on the left. However, this technique only allows for extraction from a right interaction 
without violating the hiding property of one left interaction. However, here we need to 
extract without violating the hiding property of all the left interactions. 

Our simulator-extractor as follows: In a main execution with the man-in-the-middle 
adversary, the simulator simulates all puzzles to obtain trapdoors and equivocates the 
left interactions using the solution of the puzzle and simulates the right interactions 
honestly. Whenever a decommitment on the left is requested, the simulator obtains a 
value externally (a value sampled independently from distribution D) which it decom- 
mits to the adversary (this is achieved since the protocol is adaptively secure). After the 
adversary completes the commitment phase of a right interaction in the main execution, 
the simulator switches to a rewinding phase, where it tries to extract the value com- 
mitted to by the adversary in that right interaction. Towards this, it chooses a random 
WIPOK (instead of a safe point) from the commitment phase and rewinds the adversary 
to obtain the witness used in the WIPOK (using the proof-of-knowledge extractor). In 
the rewinding phase, the left interactions are now simulated using the honest committer 
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strategy (as opposed to equivocating using the solution to the puzzle). More precisely, 
in the rewinding phase, for every left interaction that has already been opened (i.e. de- 
commitment phase has occurred in the main execution), the simulator has a value and 
random tape for an honest committer and for those that have not yet been opened, using 
the adaptive-security of the protocol, the simulator simply samples a random value from 
distribution D (since we consider only i.i.d. values for left interactions) and generates 
a random tape for an honest committer consistent with the transcript so far. This stands 
in contrast of extracting only from safe-points as in ll32ll . 

The proof proceeds using a hybrid argument, where in hybrid experiment Hj all puz- 
zle interactions are simulated and the first i left commitments to complete the preamble 
phase is equivocated. It will follow from the soundness of the UC-puzzle and statistical 
simulation that the simulation is correct Ho- First, we show that in Ho, the value ex- 
tracted in any particular right interaction from a random WIPOK is the value decommit- 
ted to by the adversary. This follows from the fact that for the adversary to equivocate, 
it must know the solution to the UC-puzzle and this violates the statistical simulation 
and soundness condition of the puzzle. We show the following properties for every i, 
and the proof of correctness follows using a standard hybrid argument. 

- If the value extracted in any particular right interaction from a random WIPOK 
is the value decommitted to by the adversary in Hi—\, then the value extracted 
from a random WIPOK and the safe point of that right interaction w.r.t to i th left 
interaction are the same and equal to the decommitment. We show this by care- 
fully considering another sequence of hybrids that yields an adversary that violates 
the soundness of the UC-puzzle in an execution where the puzzles are not simu- 
lated. This will rely on fact that the simulator simulates the left interactions in the 
rewindings using the honest committer strategy and the pseudo-randomness of the 
non-interactive commitment scheme used in the Commitment phase. 

- If the value extracted from the safe point is the decommitment in Hj_i then the 
same holds in Hi. We rely on the proof technique of lf32l through safe-points to 
establish this. In slightly more detail, we show that for any particular right inter- 
action, the value extracted from the safe-point w.r.t i th left interaction does not 
change when the i th left commitment is changed from an honest commitment to 
an equivocal commitment. Recall that a safe-point can be used to extract the value 
committed to in the right without rewinding the particular left interaction. Since, the 
non-interactive commitment scheme used has pseudo-random commitments, an ad- 
versary cannot distinguish if it is receiving an honest or equivocal commitment in 
the i th interaction. 

- If the value extracted in the right interaction from the safe point is the value decom- 
mitted to by the adversary in H, , then the value extracted from a random WIPOK 
and the safe point are the same and equal to the decommitment in Hi. This is es- 
tablished exactly as the first property. 

See the full version m for the formal construction and proof. 

4.2 Adaptive UC-Secure Commitment Scheme 

We now provide the construction of a UC-commitment scheme. First, we recall the 
construction of the adaptive UC-secure commitment in the common reference string 
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model (CRS) from m to motivate our construction. In the m construction, the CRS 
contains two strings. The first string consists of a random image y = f(x) of a one-way 
function / and the second string consists of a public key for a cca-secure encryption 
scheme. The former allows a simulator to equivocate the commitment when it knows x 
and the public key allows the simulator to extract committed values from the adversary 
using its knowledge of the corresponding secret-key. The additional CCA requirement 
ensures non-malleability. 

Our construction follows a similar approach, with the exception that instead of hav- 
ing a common reference string generated by a trusted party, we use the equivocal non- 
malleable commitment to generate two common-reference strings between every pair 
of parties: one for equivocation and the other for extraction. This is achieved by running 
the following “non-malleable” coin-tossing protocol between an initiator and a respon- 
der. Let (S C om , Rcom) be a concurrent equivocal non-malleable commitment scheme and 
(Spuz, Rpuz) be a UC-puzzle. 

1. The initiator commits to a random string r° using (S com , Rcom) to the responder. 

2. The responder chooses a random string r 1 and sends to the Initiator. 

3. The initiator opens its commitment and reveals r°. 

4. The output of the coin toss is: r = r° ® r 1 . 

The coin-tossing protocol is run between an initiator and responder and satisfies the 
following two properties: (1) For all interactions where the initiator is honest, there is a 
way to simulate the coin-toss. This follows directly from the equivocability of the com- 
mitment scheme (S com , Rcom)- (2) For all interactions where the initiator is controlled 
by the adversary, the coin-toss generated is uniformly-random. This follows from the 
simulation-extractability of the commitment scheme. 

Using the coin-tossing protocol we construct an adaptive UC-commitment scheme. 
First, the sender and receiver interact in two coin-tossing protocols, one where the 
sender is the initiator, with outcome coini and the other, where the receiver is the initia- 
tor, with outcome coin 2. Let x be the statement that coini is in the image of a pseudo- 
random generator G. Also let, pk = oGen(com2) be a public key for the simulatable 
encryption scheme (Gen, Enc, Dec, oGen, oRndEnc, rGen, rRndEnc). To commit to a 
string p, the sender sends a commitment to /3 using the non-interactive language-based 
commitment scheme with statement x along with strings So and Si where one of the 
two strings (chosen at random) is an encryption of decommitment information to P and 
the other string is outputted by oRndEnc. In fact, this is identical to the construction 
in m, with the exception that a simulatable encryption scheme is used instead of a 
CCA-secure scheme. 

Binding follows from the soundness of the adaptive UC-puzzle and hiding follows 
from the hiding property of the non-interactive commitment scheme and the semantic 
security of the encryption scheme. It only remains to show that the scheme is concur- 
rently equivocable and extractable. To equivocate a commitment from a honest com- 
mitter, the simulator manipulates coini (as the honest party is the initiator) so that 
coini = G(s) for a random string s and then equivocates by equivocating the non- 
interactive commitment and encrypting the decommitment information for both bits 
0 and 1 in Sb and Si-b (where b is chosen at random). To extract a commitment 
made by the adversary, the simulator manipulates coin^ so that coiri'2 = rGen (r) and 
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(pk, SK) = Gen(r) for a random string r. Then it extracts the decommitment informa- 
tion in the encryptions sent by the adversary using SK. 

The procedure described above works only if the adversary does not encrypt the 
decommitment information for both 0 and 1 even when the simulator is equivocating. 
On a high-level, this follows since, if the coin-toss coin i cannot be manipulated by 
the adversary when it is the initiator, then the coini is not in the range of G with very 
high probability and hence the adversary cannot equivocate (equivocating implies a 
witness can be extracted that proves that coini is in the range of G). Proving this turns 
out to be subtle and an intricate analysis relying on the simulation-extractability of the 
(Scom, Room) -scheme is required. 

We use a “non-malleable” coin-toss protocol to generate two keys, one for equivo- 
cation and another for extraction. Such an idea has been pursued before, for instance, 
in E), they use a coin-toss to generate keys for extraction and equivocation. However, 
they use a single coin-toss and depending on which party is corrupt, the simulation 
yields an extraction or equivocation key. In recent and independent work, Garg and Sa- 
hai m, show how to achieve stand-alone adaptively-secure multiparty computation 
in the plain model (assuming no-setup) using non black-box simulation. They rely on 
a coin-tossing protocol using equivocal commitments to generate a common random 
string and then rely on previous techniques used in the uniform reference string model 
m to securely realize any functionality. An important difference between their ap- 
proach and ours is that while our construction relies on a single trapdoor they require 
the trapdoors to be non-malleable0 See Figure[2]for a formal description of our protocol 
(For further details and the proof, we refer the reader to the full version H2). 


5 Puzzle Instantiations 

By TheoremQ] it suffices to present an adaptive UC puzzle in a given model to demon- 
strate feasibility of adaptive and UC secure computation. We first give some brief intu- 
ition on the construction of adaptive UC-puzzles in various models. Formal construc- 
tions and proofs are found in the full version El- 

In the Common reference string (CRS) model, the Uniform reference string (URS) 
model and the Key registration model the puzzles are identical to the ones presented 
in ED for the static case, where the puzzle interactions essentially consists of a call 
to the corresponding ideal setup functionalities. Thus, in these models, the simulator is 
essentially handed the trapdoor for the puzzle via its simulation of the ideal functional- 
ity and the puzzles are non-interactive. In the Timing model and the Partially Isolated 
Adversaries model, we rely on essentially the same puzzles as m, however, we need 
to modify the simulator to accommodate adaptive corruption by the adversary. 

Constructing adaptive UC-puzzles in the Sunspots model is less straightforward and 
so we give more detail here. Simulated reference strings r in the Sunspots model have 
Kolmogorov complexity smaller than k. Thus, as in ED, the puzzle sender and receiver 
exchange 4 strings (wi, c-2 , t>2 • C2). We then let ( P' denote the statement that ci, C2 are 
commitments to messages pi,P2 such that («i , pi , t>2, P2) is an accepting transcript of 

5 In (T9), they use separate keys for each party and in (26), the trapdoors admit a “simulation- 
soundness” property. 
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Protocol (S, R): Input: The sender S has a bit 8 to be committed to. 

Preamble: 

- An adaptive UC-Puzzle interaction (S puz , R P uz) on input 1”' where R is the receiver and 
S is the sender. Let TRANSi be the transcript of the messages exchanged. 

- An adaptive UC-Puzzle interaction (S pU z, R P uz) on input 1”' where S is the receiver and 
R is the sender. Let TRANS2 be the transcript of the messages exchanged. 

Commit phase: 

Stage 1: S and R run a coin-tossing protocol to agree on strings pk and CRS: 

Coin-toss to generate PK: 

1. The parties run protocol (S C om, Rcom) with common input TRANSi. R plays the 
part of sender with input a random string pr. 

2. S chooses a random string r$ and sends to R. 

3. R opens its commitment and reveals Pr. 

4. The output of the coin toss is: r = r<- ffi Pr. S and R run oGen(p) to obtain public 
key pk. 

Coin-toss to generate CRS: 

1. The parties run protocol (S C om, Rcom) with common input TRANS2. S plays the 
part of sender with input a random string Ps. 

2. R chooses a random string pr and sends to S. 

3. S opens its commitment and reveals r$ ■ 

4. The output of the coin-toss is: x = r| ffi Pr. 

Stage 2: 

1. The parties run (S eq , R eq ) with common input x to generate a commitment C = 
EQCom x (/3; p) where S plays the part of S eq with input bit 8- 

2. S chooses b £ {0, 1} at random and sends to R the strings (So, Si) to where: 

- Si, is an encryption of the decommitment information of C (to bit B) under 
PK. 

- Si-i, is generated by running oRndEnc(PK, p Enc ) where pe„c is chosen uni- 
formly at random. 

Reveal phase: 

1. S sends /3, b, and the randomness used to generate So, Si to R. 

2. R checks that So, Si can be reconstructed using 8, b and the randomness produced by 

S. 


Fig. 2. The Adaptive Commitment Protocol (S, R) 

a Universal argument of the statement ( P = KOL(r) < k. Note that since we require 
statistical and adaptive simulation of puzzles, the commitment scheme used must be 
both statistically-hiding and ’’obliviously samplable” (i.e. there is a way to generate 
strings that are statistically indistinguishable from commitments, without ’’knowing” 
the committed value). 

To construct an adaptive puzzle for the bounded-concurrent model we follow an 
approach similar to the sunspots model combined with the bounded-concurrent non 
black-box zero-knowledge protocol of BarakQ]]. In fact this is inspired by the stand 
alone adaptive secure multiparty computation construction of Garg, et al, m. 
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Abstract. The Even-Mansour (EM) encryption scheme received a lot 
of attention in the last couple of years due to its exceptional simplicity 
and tight security proofs. The original 1-round construction was natu- 
rally generalized into r-round structures with one key, two alternating 
keys, and completely independent keys. In this paper we describe the 
first key recovery attack on the one-key 3-round version of EM which is 
asymptotically faster than exhaustive search (in the sense that its run- 
ning time is o(2") rather than 0(2") for an n-bit key). We then use the 
new cryptanalytic techniques in order to improve the best known attacks 
on several concrete EM-like schemes. In the case of LED-128, the best 
previously known attack could only be applied to 6 of its 12 steps. In this 
paper we develop a new attack which increases the number of attacked 
steps to 8, is slightly faster than the previous attack on 6 steps, and uses 
about a thousand times less data. Finally, we describe the first attack on 
the full AES 2 (which uses two complete AES-128 encryptions and three 
independent 128-bit keys, and looks exceptionally strong) which is about 
7 times faster than a standard meet-in-the-middle attack, thus violating 
its security claim. 

Keywords: Cryptanalysis, key recovery attacks, iterated Even-Mansour, 
LED encryption scheme, AES 2 encryption scheme. 


1 Introduction 

The Even-Mansour (EM) block cipher was first proposed at Asiacrypt’1991 [§]. 
It uses a single publicly known random permutation P on n-bit values and two 
secret n-bit keys Ki and K 2 . and defines the encryption of the n-bit plaintext 
m as E(m) = P(rn CD K\ ) © K 2 . The decryption of the n-bit ciphertext c is 
similarly defined as D(c) = P _1 (c® K 2 ) © K\ . It can be naturally generalized 
into an r-round iterated EM encryption function (a.k.a. a key-alternating scheme 
in fli), which is defined using r permutations Pi, P 2 , . . . , P r and r + 1 keys 
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K 1 ,K 2 ,... K r+ 1 as E{m) = P r (. . . J*(i\ (to © Ad) © K 2 ) ® K 3 . . . © K r ) © K r+1 , 
where decryption is defined in an analogous way. 

For about 20 years this scheme received very little attention in the crypto- 
graphic literature, but in the last couple of years it became a very active research 
area: multiple papers about this scheme appeared at Crypto, Eurocrypt, Asi- 
acrypt, CHES and FSE QSS EMI , analyzing its theoretical properties, 
generalizing it in various ways, and proposing concrete constructions of block 
ciphers which are based on the EM structure. 

In this paper we describe several new key recovery attacks on iterated EM 
schemes, analyze their complexity, and apply them to some concrete proposals 
of block ciphers which have this structure. The origin of the observations used 
in our attacks goes back to the first paper which attacked EM, by Daemen @ 
in 1991. Daemen observed that in single-round one-key EM, an attacker can use 
the fact that the XOR of the unknown input and output of the permutation P 
is equal to the known XOR of the plaintext and the ciphertext. This observation 
can be used to break 1-round EM significantly faster than exhaustive search. 

At FSE’ 13, Nicolic et al. 0 extended the basic observation considerably. 
They considered the graph of the function P'{x) = x © P(x)0 and showed 
that vertices with a large in-degree in this graph can be exploited to bypass an 
additional round of EM, but at the expense of enlarging the time complexity to 
slightly less than exhaustive key search. 

In this paper, we develop the techniques one step further, and show that 
graphs of the functions P[ and P' :i (corresponding to the permutations Pi and P 3 ) 
can be deployed simultaneously, resulting in an attack on 3-round EM. However, 
this enhancement is not sufficient by itself, since the time complexity becomes 
very close to that of exhaustive key search. Nevertheless, a surprising feature of 
our 3 round attack is that it has about the same time complexity as the 2-round 
attack. This feature is due to a novel filtering technique based on tailor-made 
linear subspaces that we develop in Section 1^1 and allows us to quickly dispose 
of data which is useless for our attack. Another novel technique that we develop 
in this paper allows us to adapt the differential-based attack of 0 (which was 
originally applied to 2-round iterated EM with one key) to 2-round iterated EM 
with completely independent keys, and thus to attack the full AES 1 2 scheme. 
While the attack of [15[ makes use of plaintext pairs with a fixed difference, 
we notice that in its original form it cannot improve the standard meet-in-the- 
middle attack on this scheme. In our attack, we work on non-standard structures 
of plaintext triplets which allow us to filter out wrong guesses for the key more 
efficiently. 

Throughout the paper we follow the standard conventions in the analysis of 
time and memory complexities. Our basic unit of memory is an n-bit block. 
Our basic unit of time is a single evaluation of the encryption or the decryption 
function, i.e., the full r-round iterated EM scheme. The scheme requires the 
evaluation of the r permutations Pi (which are assumed to be heavy operations) 

1 In [Tfl . the permutation P is actually the full encryption function, and thus a: is a 

message and P(x) is its corresponding ciphertext. 


Key Recovery Attacks on 3-round Even-Mansour 339 


and a small number of simple operations (such as XORs) which are assumed to 
require negligible timeU Thus, an invocation of a single permutation P t (or its 
inverse) costs 1/r time units. For the sake of convenience, we often partition the 
attack into an offline preprocessing phase which analyzes the properties of the 
public Pf s, and an online attack phase which analyzes the given plaintexts and 
ciphertexts. However, we always define the time complexity of the attack as the 
sum of the complexities of its offline and online phases. This is different from 
the model used by Heilman in his time/memory tradeoff attack, which allowed 
unlimited free preprocessing and considered only the online complexity (note 
that in our model, Heilman’s attack is no better than exhaustive search). To 
prevent other types of “cheating” , we always add the time required to generate 
the data to the final time complexity, and add the space required to hold the 
data to the final space complexity. 

All our attacks are only slightly better than exhaustive search, which raises the 
natural question whether they should be considered as legitimate attacks. This 
is a general problem in cryptanalysis, since it is difficult to decide whether an 
attack such as the Biclique attack on AES-128 [dj which requires 2 126 time really 
“breaks” a scheme whose exhaustive search requires 2 128 time. Some researchers 
suggested that this issue should be decided by the nature of the attack: If an 
attack on an n-bit scheme has an outer loop which tries 2" different possibilities, 
but performs for each one of them an operation which is cheaper than a single 
encryption, then the attack should be called an “improvement of exhaustive 
search” rather than a “real attack” , and the scheme is not said to be “broken” by 
it. However, this is a fragile definition since the same attack can be described in 
multiple ways, and it is not always clear whether it tries 2" or fewer possibilities. 

Fortunately, in cryptographic schemes such as EM which can be naturally 
defined for arbitrarily large key sizes n, we can avoid this fragility by analyz- 
ing the asymptotic complexity of the attack. As we show in this paper, our 
attacks are about n/ log(n) times faster than exhaustive search. Since this ratio 
is unbounded when n increases, our attacks are asymptotically better than any 
standard or improved version of exhaustive search, and this is a robust state- 
ment since it ignores all the multiplicative constants which are associated with 
a particular model of computation. 

Some of the concrete schemes we consider in this paper (such as LED and 
AES 2 ) pose the following problem: they use the general EM framework, but 
instantiate P with a fixed-key AES-like permutation which is defined only for 
a few values of n, and thus it is difficult to define their asymptotic security. 
We solve this problem in two ways. First, we observe that all our attacks are 
completely generic, and do not exploit any particular properties of P besides 
its randomness. We can thus analyze the performance of our attacks assuming 


2 This complexity gap is typically large for normal choices of n, and likely to grow 
even larger as n increases: the number of 2-bit to 1-bit gates in the Boolean circuit 
of Pi which are needed to thoroughly and independently mix the n input bits into n 
output bits is expected to grow super-linearly with n, whereas the number of gates 
in the Boolean circuit of XOR grows only linearly with n. 
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that AES is replaced in these schemes by a random permutation over n-bit 
values, and show that their asymptotic time complexity is o( 2"). In addition, we 
carefully analyze the exact complexity of our attacks for the particular values 
of n recommended for these schemes, and show that they are between 7 and 20 
times faster than exhaustive search, depending on which scheme we attack. 

We would like to point out that some of the previously published attacks 
on these schemes (such as 1) are distinguishing attacks, and thus they are 
incomparable to our key recovery attacks. In addition, our attacks may fail to 
find the key or require longer than expected time for a small fraction of “bad” 
permutations, since we only analyze their expected behavior when the Pi’s are 
randomly chosen permutations. 

The paper is organized as follows. In Section 2, we introduce our new crypt- 
analytic techniques, and use them to attack the one-key, three-round version of 
EM (the best previous attack could only handle the two-round version of EM). 
In particular, our new attack influenced the decision of the designers of the Zorro 
block cipher Eil to increase its number of steps from 3 to 6. In Section 3, we 
consider the LED block cipher, which was proposed at CHES 2011 [l2j]. It has 
two flavors: a one-key version called LED-64, and a two-key version (in which 
the two keys are alternately used) called LED-128. In the case of LED-64, the 
best previously published attack [L| appeared at ACISP 2012, and could only 
handle 2 steps. We increase the number of steps we can attack from 2 to 3. In 
the case of LED-128, the best previously published attack [7(| appeared at FSE 
2013, and could handle 6 steps out of the 12 steps of full LED-128. We increase 
this number to 8, using smaller time and data complexities. In Section 4, we 
consider the generalized version of EM in which all the keys are completely in- 
dependent, and show how to attack the 2-round version of this scheme. We then 
use the new techniques in order to describe the first published attack on the full 
version of the block cipher AES 2 , which was presented at Eurocrpyt 2012 by 
[Bj] . The scheme looks exceptionally strong, using two complete AES encryptions 
and three independent 128-bit keys. In fact, the designers of AES 2 conjectured 
that the best attack on their scheme is a meet-in-the-middle attack, but our new 
attack disproves this claim since it is about 7 times faster. 

2 Attacks on Iterated Even-Mansour with One Key 

We first consider iterated EM schemes with one key K and r permutations 
Pi, P- 2 , ... ,P r , as shown in Figure [T] (note that if all the permutations are also 
the same, the scheme is extremely vulnerable to slide attacks 0]). Our goal is to 
use properties of one of the public permutations P e LPi, P 2 , . . . , P r } in order to 
deduce properties of the associated keyed permutatioio Q{K, x) = K®P(x®K) 
(used inside the EM construction), which hold for any value of K. As Daeman 
pointed out in 1991 for any value of K and in any invocation of Q{K, x), the 
XOR of its input and output is equal to the XOR of the input and output of the 
internal P function in the same invocation, i.e., x®Q{K, x) = (x®K)®P(x®K). 

3 In general, given some public permutation Pi, we denote Qi(K, x) = K © Pi{x © K) . 
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Another interesting observation is that when K is unknown we cannot determine 
x ® K, but the addition of K just renames the input vertices in the bipartite 
graph of P'(x) = x ® P(x), and thus it preserves the distribution of in-degrees 
of its output vertices. In particular, if some output values of P' are more likely 
than expected (i.e., appear more than the average), then we can predict the value 
Q(K, x) with a higher probability than expected even when K is unknown. More 
specifically, any t - way collision on the value v in P', namely x\, X 2 , ■ ■ ■ , x t such 
that x\ ® P(x 1 ) = X 2 ® P{x 2 ) = . . . = x t ® P(x t ) = v for some value of v, 
yields a t - way collision on the value v in the function Q'(K, x) = x ® Q(K, x) = 
x®K@P{x@K). Assume that indeed we manage to find during a preprocessing 
phase a large t - way collision in the public P'(x) on the output value v. Since it 
also yields a t - way collision on the value v in the keyed function Q'(K,x), there 
are at least t values of x for which Q'(K,x) = v, and thus Q(K,x ) = x ® v. 
Consequently, we can guess Q(K,x) with a probability which is t times higher 
than the expected 1/2" even when we know nothing about K. 



Fig. 1. An iterated EM with one key 

This graph theoretic property is strongly related to the one used in [l(| , but 
we use it in a different way. Whereas we use properties of the public permutations 
(which can be observed during a preprocessing phase), [l6j exploits properties 
of the given plaintext-ciphertext pairs: assume that rrij ® Cj = v for multiple 
plaintext-ciphertext pairs ( m,j,Cj ). Then, for all of these pairs, they know that 
(rrij ® K) ® (cj ® K) = v. Thus, the attack of 0 is based on the property 
that the XOR of the inputs to the first and last public permutations Pi and 
P” 1 attain the value v more than the expected number of times. In particular, 
in their attack it is not clear how to compute such a v during a preprocessing 
phase, and they have to wait for the actual data in order to search for the best v 
in it. Our attacks, on the other hand, are based on the property that the XOR of 
the input and output of a single public permutation attains some value v more 
than the expected number of times, and thus we can find the best v once and 
for all, before any data is given for a particular key. 

In order to estimate the highest expected in-degree in the bipartite graph of 
P'(x ) = x ® P(x), we assume that for a random choice of the permutation P, 
the function P' behaves as a random function. This is not completely true, since 
there are some extremely expensive ways to distinguish between such cases (for 
example, the XOR of all the 2" values of P' is zero, whereas the XOR of all the 
outputs of a truly random function is unlikely to be zero). However, it is easy to 
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verify with appropriate simulations that the in-degree distributions of the two 
models behave almost identically, which is all we need in our attack^ 

The main problem in applying this attack is that going over all the 2 n possible 
values of x in order to find the most popular v will make our attack slower 
than exhaustive search (since we do not allow free preprocessing in our model). 
Fortunately, we can find vertices v' which are almost as popular by trying only 
a small subset X C {0, 1}" of possible inputs. We denote this restricted function 
by f\x, and note that it induces a subgraph in the bipartite graph associated 
with /, in which the left side of the graph contains only the vertices in X. Our 
goal now is to analyze the expected distribution of the in-degrees in random 
subgraphs of random functions. 

Random functions have been extensively analyzed in the literature (e.g., see 
0 )- It is well-known that the in-degree of an element in the range of f\x is 
distributed according to the Poisson distribution with an expectation A, which 
is equal to the average in-degree (i.e., A = \X\/2 n , which is the ratio between the 
sizes of the domain and range of f\x)- Given a parameter t. the probability that 
an arbitrary element v will have an in-degree of t is thus (A 4 * e _A )/t!. We have 
2 n elements in the range, implying that we expect that about (2" • A t e _A )/t! 
vertices will have an in-degree of t. If we equate this number to 1 and ignore 
low order terms, we can deduce that the largest expected in-degree t satisfies 
t ■ log(f) = n, and thus t is approximately equal to nj log(n). The crucial point 
is that this highest in-degree grows in an unbounded way as n increases, and 
thus any complexity of the form 0(2 n /t) behaves asymptotically as o(2"). If we 
reduce this maximal t to t— i for a small i, we expect to find about (t/X) 1 vertices 
which have this reduced in-degree. Since t > 1 and A < 1, this number grows 
exponentially with i, and we can thus find a huge number of vertices which have 
almost maximal in-degrees. 

To get a sense of the concrete values implied by this distribution, consider the 
recommended value of n = 64 in the LED block cipher. If we consider all the 
2 64 possible inputs, we expect to see 2 or 3 vertices of degree 20, 55 vertices of 
degree 19, and 1060 vertices of degree 18. If we reduce the number of possible 
inputs to 2 63 , we expect to see 1 vertex of degree 17, 8 vertices of degree 16, 
and 260 vertices of degree 15. If we further reduce the number of possible inputs 
to 2 60 , we expect to see 4 vertices of degree 10, 695 vertices of degree 9, and 
100130 ~ 2 16 - 6 vertices of degree 8. 

The attacks in this paper are described in terms of several parameters, and it is 
usually possible to obtain various tradeoffs between their time, data and memory 
complexities by tweaking the parameter values. However, since there is no simple 
formula which describes the exact tradeoff curves, one needs to determine favor- 
able tradeoff points on the curves by plugging in a few values for the parameters 


4 In fact, collisions in P’(x) are slightly less likely to occur when P is a random function, 

since if P(x) = P(y) (for x ^ y) then P'( x) ^ P'{y), whereas if P is random 

permutation then x ^ y implies P(x) / P(y), and the probability for P'(x) = P'{y) 
is a bit higher. As a result, our analysis slightly underestimates the highest expected 
in-degree, and thus the attacks that we describe are actually (negligibly) faster. 
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and calculating the resultant complexities of the algorithms. This is demonstrated 
in our attacks, where we suggest concrete points on the curves which minimize the 
time complexity, but stress that there are other options as well. 

2.1 Attacks on 2-Round Iterated Even-Mansour with One Key 

We start by describing a very basic attack, 2RoundlKeyBasic. Let S and D be 
parameters. 

Preprocessing: 

PR1. Evaluate P[ on an arbitrary subset of inputs X, such that |X| = S, and 
store the output values (without their associated input values) in a sorted 
list. 

PR2. Traverse the sorted list and find the output v-\ which occurs the maximal 
number of times (in t\ consecutive locations). 

Online: 

01. Ask for the encryption of D arbitrary plaintexts. 

02. For each plaintext-ciphertext pair ( rrii,Ci ): 

(a) Assume that Qi(K, rrii) =mi®v\ = Zi and calculate P 2 {zi). 

(b) Test the suggestion for the key K' = P 2 (z,) ® Cj by checking whether 
indeed Qi(K' ,rrii) = rm ® v\ . If the test fails, increment i and return 
to Step 02. Otherwise, return the suggested key. 

The time complexity of the preprocessing phase is S evaluations of Pi, and 
its memory complexity is also S. Note that the output of the preprocessing 
phase is only the value v\ and the corresponding number t\ , and we can discard 
the rest of the sorted list (In our model, we can ignore the sorting time of 
the list, since sorting uses only cheap comparison operations!!). In addition, 
since we can execute the online phase in streaming mode by working on each 
given plaintext-ciphertext pair independently and discarding it afterwards, its 
memory complexity is negligible. The expected time and data complexities of 
the online phase depend on the value of ti: we know that there are at least t\ 
values of x such that Qi(K, x ) = x@v\. According to the birthday paradox, after 
trying about 2 n /ti arbitrary messages we expect that at least one m, will satisfy 
Qi(K,mi) = m t 0 V\ and suggest the correct value of K. Thus, the expected 
data complexity of the online algorithm is 2”/ti, and in order to compute its 
time complexity, we need to sum 2 n /ti evaluations of P 2 in Step 02. (b), and 
2"/ti encryptions in order to generate the data. 

5 One may notice that since sorting requires 0(nlog(n)) basic operations, then our 
algorithm actually requires about 2 n basic operations. However, as mentioned before 
we expect the circuit size of any reasonable choice of Pi to grow at least as n 1+e 
(for some e > 0) when n increases, and thus the real time complexity of exhaustive 
search is in fact 0(n 1+f ■ 2") basic operations, which is asymptotically larger than 
the number of basic operations performed by our algorithm when we take the sorting 
time of 0(2" /n) values into account. 
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Optimizing the Basic Algorithm. We now describe several useful optimiza- 
tions of the 2RoundlKeyBasic algorithm. The first optimization is to use the 
freedom to choose the subset X during the preprocessing phase in order to im- 
mediately filter out most of the wrong key suggestions that are now filtered only 
in Step 02. (b) of the online algorithm, and thus avoid the Q i evaluations in 
these cases. The idea uses a technique that resembles (but is not the same as) 
splice-and-cut [|| : assume that we choose the set X of size S as the subspace of 
values x in which the n— log(5) LSBs are zero (or any other constant). Then the 
value of these n — log (S') LSBs in all the t\ inputs x that satisfy P[ \ x {x) = v\ is 
zero. Consequently, we know that for any plaintext m*, if rrq © K is one of these 
t\ inputs, then the n — log(S') LSBs of K are equal to those of m, . Thus, before 
testing the suggested key in Step 02. (b), we can check whether its n — log(S') 
LSBs are equal to those of m,; , and otherwise discard it without evaluating Q\ . 
We note that in this attack, the saving in time complexity due to this optimiza- 
tion is small, however, in Section 12.21 we show that a similar idea yields a more 
significant saving in our attacks on 3-round iterated EM. We alert the reader 
that even though the values in X are now chosen in a specific way, the attack 
remains a known plaintext attack since there is no restriction on the choice of 
the mi’s. 

The second optimization is to consider £ > 1 outputs of P[ with a high 
in-degree instead of just one. This allows us to reduce the data complexity of 
the attack at the expense of using more memory and slightly more time dur- 
ing the online phase of the attack. Since the original online algorithm required 
only negligible memory, this tradeoff seems favorable. Our optimized algorithm 
2RoundlKeyOpt is described below, using S, D and £ as parameters. 
Preprocessing: 

PR1. Evaluate P[ on a subset of S inputs, X, such that the n — log(S') LSBs of 
each x £ X are zero. Store the output values in a sorted list. 

PR2. Traverse the sorted list and store the outputs v%,V 2 , ■ ■ ■ ,vg which have the 
highest in-degrees. Denote the in-degrees of the outputs vi,v%, . . . ,vi by 
ti, t 2 , ■ ■ ■ , te, respectively. 

Online: 

01. Ask for the encryption of D arbitrary plaintexts. 

02. For each plaintext-ciphertext pair (rnj,Cj): 

(a) For j e {1,2,... ,£}■. 

i. Assume that Qi(K,mi) = to* ® vj = Zij and calculate P 2 (zij). 

ii. Let K' = P 2 (z ^ ) © c;. If the n — log(S') LSBs of K' are different 
from those of c t , discard it and return to Step 02. (a) (if j = £ return 
to Step 02). Otherwise, test K' by checking whether Q\{K' , mi) = 
mi © Vj. If the test succeeds, return K', otherwise, if j < £ return 
to Step 02. (a) and if j = £ return to Step 02. 

As in the 2RoundlKeyBasic , the time complexity of the preprocessing 
phase is S evaluations of Pi, and its memory complexity is also S. However, in 
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2RoundlKeyOpt, a bigger list of size £ is carried over to the online algorithm, 
and thus its memory complexity is increased to £. In order to calculate the time 
and data complexities, we denote by t the average value of t \ , t 2 , ■ ■ ■ , t(, and thus 
there are t£ values of x for which Qi(K,x) = x © v j for j e {1,2, .. . ,£}. Ac- 
cording to the birthday paradox, after trying about 2"/ iff) arbitrary messages, 
we expect that at least one to* will satisfy Q\(K,rrii) = ® Vj and suggest 

the correct value of K. Thus, the expected data complexity of the attack is 
2 n /(t£). Since we perform l evaluations of P 2 per given message, the expected 
time complexity of the online algorithm is about l ■ D = 2 n /t evaluations of 
P 2 , S/2 n ■ 2 n /t = S/t evaluations of Pi in Step 02.(a).ii, and 2 n /(t£) time to 
generate the data. 


Concrete Parameters. For n = 64, let S = 2 60 , which implies A = 2 60 /2 64 = 
2 -4 . As shown before, by using the formula (2" • A t e~ x )/t\ = 2 64 • (2 _4t e _1 / 16 )/t! 
with t = 10, it is easy to check that in such an evaluated subgraph of a random 
function we expect to see at least £ = 4 vertices with an in-degree of 10. With 
these parameters, the time complexity of the preprocessing phase is 2 60 evalua- 
tions of Pi (which is equivalent to 2 59 evaluations of the 2-round scheme), and 
its memory complexity is 2 60 . The memory complexity of the online algorithm 
is negligible, its data complexity is 2 64 /(10 • 4) = 2 58 7 known plaintexts and 
its time complexity is 2 64 /10 evaluations of P2 and 2 6O /10 evaluations of Pi, 
which is equivalent to about 2 59 - 8 time units. Adding the 2 58 - 7 time required to 
generate the data, we obtain a total time complexity of about 2 60,4 , which is 
about 12 times faster than exhaustive search. 

We can significantly reduce the data complexity by considering all the vertices 
with an in-degree of at least 8, whose number £ is expected to exceed 2 16 . This 
does not affect the time and memory complexities of the preprocessing phase. 
The memory complexity of the online algorithm is now 2 16 (which is still quite 
small), its data complexity is 2 64 /(8 • 2 16 ) = 2 45 known plaintexts and its time 
complexity is now 2 64 /8 evaluations of P2 and 2 60 /8 evaluations of Pi, which 
is equivalent in total to about 2 60 1 time units, or about 15 times faster than 
exhaustive search (note that we actually gain in time complexity since we use 
significantly less data). 


2.2 Attacks on 3-Round Iterated Even-Mansour with One Key 

In the attacks on 2-round iterated EM with one key, we use properties of Pi in 
order to guess a value of Qi(K, x) with a higher probability than expected. We 
then apply to this guess the public permutation P2, which immediately gives us 
a suggestion for the key by XORing the obtained value with the ciphertext. In 
order to attack 3-round iterated EM with one key, we start with the same idea. 
However, after the evaluation of P2 , we cannot immediately get a suggestion for 
the key, as we still have to apply the complex operation of XOR’ing the unknown 
key, applying P3, and XOR’ing the unknown key again, before we can compare 
the result to the ciphertext. Nevertheless, we notice that given the value at the 
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output of P-2, we reduce the key recovery problem to attacking a single-round 
EM scheme with one key, to which we can apply the simple attack of [8j. Thus, 
we run an additional preprocessing step which evaluates and stores in a sorted 
list of values of P^(x) = x © Pi(x) for various inputs x. The sorted list is used 
in the online algorithm in order to obtain suggestions for the key, as described 
in the basic algorithm SRoundlKey Basic below, which uses Si, S 3 and D as 
parameters. 

Preprocessing: 

PR1. Evaluate P[ on an arbitrary subset of inputs X\ such that \X-[ \ = Si, and 
store the output values in a sorted list. 

PR2. Traverse the sorted list and find an output Vi with a maximal in-degree, 
denoted by ti . 

PR3. Evaluate P( on an arbitrary subset of inputs X 3 , such that | X A | = S 3 , and 
store the output values P'j(x) in a sorted list L 3 next to the corresponding 
value of Ps(x). 

Online: 

01. Ask for the encryption of D arbitrary plaintexts. 

02. For each plaintext-ciphertext pair ( rrii,Ci ): 

(a) Assume that Qi(K, rrii) = to * ® Vi = Zi and calculate P 2 (zi). 

(b) Look for the value of P 2 (zi) ® c* in L 3 . If there is no match, return to 
Step 02 and increment i. 

(c) For each match of P 2 (zi) ® < k , obtain the value of Ps(x) (for which 
Psi&i) ® Ci = P£(x) = x ® Ps(x)), and test the key suggestion K' = 
Pz(x) ® Ci by checking whether Qi(K', ) = m; ® vi. If the test fails, 
continue with the next match (if none remain, return to Step 02). Oth- 
erwise, return the key. 

The time complexity of the preprocessing phase is Si evaluations of Pi and S3 
evaluations of P3, and its memory complexity is max(Si, S3). Note that we do 
not need to store any of the values generated in the first step of the preprocessing 
after Step PR2 terminates. The memory complexity of the online algorithm is 
S3. In order to calculate the expected time and data complexities of the online 
algorithm, we notice that after we process D pairs (m„ c t ), we expect that at 
least ( ti ■ D)/2 n of them satisfy Qi ( K , mi) = m,; CD vj , and consequently at least 
(ti ■ D ■ S3) /2 2n pairs will be matched and suggest the correct value for the key 
in Step 02. (c). Thus, in order to obtain a correct suggestion for the key, we 
require (ti ■ D ■ S',3)/2 2n = 1, implying that the data complexity of the attack is 
D = 2 2n /(ti ■ S3). We expect a match in Step 02.(c) for a fraction of S3/2” of 
the (mi, ci) pairs. Thus, we estimate the time complexity of the online algorithm 
as D = 2 2n /(ti ■ S3) evaluations of P2, S'3/2" • 2 2n /(ti ■ S3) = 2 n /ti evaluations 
of Pi, and 2 2 "/(ti • S3) time required to generate the data. 


Optimizing the Basic Algorithm. Similarly to our 2RoundlKeyOpt attack, 
we would like to use the freedom to choose the subset Xi during preprocessing 
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in order to reduce the time complexity of the attack. However, in this attack we 
will use this freedom in a different way: we “synchronize” the sets X\ and X 3 
such that we can instantly rule out most pairs (just by comparing bits 

of m, and Cj) that do not simultaneously satisfy both Qi(K, nrii) = to* © v\ and 
P 3 _ 1 (c, ® K) £ X 3 . Thus, we can discard most pairs (m,;, eg) which will suggest 
a wrong key (or suggest no key at all) with negligible computation. 

We now assume that |Ad| = |X 3 | = S. Similarly to the 2RoundlKeyOpt 
algorithm, we choose Xi as a subspace of values x in which the n — log (5) LSBs 
are zero (or any other constant). This implies that for any plaintext m;, if rrii®K 
is one of the ti inputs that satisfy P[\ Xl ( x ) = Vl > then the n ~ LSBs of K 

are equal to those of m*. As for x £ X ?J . we store the values of P 3 (:r) = x®Ps(x), 
and set the additional condition that the n — log (5) LSBs of P 3 ( x ) are zero (or 
any other constant). In fact, during preprocessing, we do not evaluate P 3 (x) on 
x £ X 3 , but rather evaluate P ^ 1 (y) for each y £ Y 3 , where I 3 contains all n-bit 
vectors whose n — log(5) LSBs are zero. Thus, we know that if Ci®K £ Y 3 , then 
the n — log(S') LSBs of K are equal to those of eg. Combining the conditions 
on rrij and eg, we know that a pair (to,, eg) will suggest a correct key in our 
algorithm only if the n — log (.S’) LSBs of to, and eg are equal. 

Similarly to the 2RoundlKeyOpt attack, the second optimization is to con- 
sider l > 1 outputs of P[ with a high in-degree (instead of just one), which 
allows us to reduce the data complexity of the attack. Our optimized algorithm 
2>RoundlKeyOpt is described below, and Figure [3] illustrates its online part. Let 
S, D and l be parameters. 

Preprocessing: 

PR1. Evaluate P[ on a subset of S inputs, X, such that the n — log(S') LSBs of 
each x £ X are zero. Store the output values in a sorted list. 

PR2. Traverse the sorted list and store the outputs Vi , V 2 , ■ ■ • , vg with the highest 
in-degrees. Denote the in-degrees of outputs v\, V 2 , ■ ■ ■ , vi by t\, fa, ■ ■ ■ , tg, 
respectively. 

PR3. Let Y 3 be the subspace of the |Sj n-bit vectors in which the n — log(S') 
LSBs are zero. For each y £ Y 3 , store P ^ 1 (y) (By = P'iiP-s 1 {y)) in a sorted 
list L 3 next to y. 

Online: 

01. Ask for the encryption of D arbitrary plaintexts. 

02. For each plaintext-ciphertext pair (mj, c,;), if the n — log (S') LSBs of m, and 
Ci are not equal, discard it. Otherwise: 

(a) For j £ {1,2 

i. Assume that Qi(K,rrii) = nii ® vj = Zij and calculate Pzfaj). 

ii. Look for the value of PA^rj)®^ in L :i . If there is no match: if j < t 
return to Step 02. (a), otherwise ( j = £) return to Step 02. 

iii. For each match of Pz{zij) ® eg, obtain the value of y (such that 
pj(^|#Ci = P 3 _ 1 (y)® 2 / = PAPT 1 ^)))) an d the key suggestion 
K' = y ® Ci by checking whether Qi(K. m, ) = m, ® iy. If the test 
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succeeds, return K' , otherwise, if j < l return to Step 02. (a), and 
if j = £ return to Step 02. 

The time complexity of the preprocessing phase is 5 evaluations of Pi and 
P jj” 1 , and its memory complexity is 5 + £. The memory complexity of the online 
algorithm is also 5 + i. We denote by t the average value of fi, £ 2 , • • • , U, and 
thus there are tl values of x for which Qi(K, x) = x ® Vj for ; 6 {1,2,..., £}. 
Consequently, in order to obtain a correct suggestion for the key, we require 
that (t£ ■ D ■ S)/ 2 2 " = 1, implying that the data complexity of the attack is 
D = 2 2n /(t£ ■ 5). We process a pair (m,;, Cj) (i.e., we do not discard it in step 
2) with probability 5/2”, and for each such pair we perform i evaluations of P 2 
and for a 5/2" fraction of those we also evaluate Q\ (or Pi). The expected time 
complexity of the online algorithm is thus l ■ 5/2" • D = 2" /t evaluations of P 2 , 
S/t evaluations of Pi, and 2 2 "/ (t£ ■ 5) time required to generate the data. 

Thus, the attack has about the same time complexity as the 2RoundlKeyOpt 
attack, and for l = 1 it is more efficient than the SRoundlKey Basic attack by 
a factor of about 2"/5. 


Concrete Parameters. For n = 64, let 5 = 2 60 , i.e., A = 2 60 /2 64 = 2 -4 . 
Again, we use the formula (2" • X t e~ x )/t\ = (2 64 • 2 — 4t e — 1 ' /16 )/t! with t = 8, 
such that we expect at least £ = 2 16 vertices with an in-degree of 8. With these 
parameters, the time complexity of the preprocessing phase is 2 60 evaluations of 
Pi (equivalent to about 2 58 5 evaluations of the 3-round scheme), and its memory 
complexity is 2 60 . The memory complexity of the online algorithm is 2 60 , its 
expected data complexity is 2 128 / (8 • 2 16 • 2 60 ) = 2 49 known plaintexts and its 
expected time complexity is 2 64 /8 evaluations of P 2 and 2 60 /8 evaluations of Pi, 
whose sum is equivalent to about 2 59 6 time units (the time required to generate 
the data is negligible). Note that it is possible to reduce the data complexity 
further at the expense of increasing the time complexity by considering vertices 
of a lower in-degree H 



Fig. 2. The online algorithm of 3RoundlKeyOpt 


6 For example, we expect more than 2 23 vertices with an in-degree of at least 7, and 

thus if we use only 2 128 /(8 • 2 23 • 2 60 ) = 2 42 known plaintexts for the attack, the time 

complexity of the online algorithm slightly increases from 2 59 ' 6 * to about 2 s9 ' 8 . 
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3 Applications to Step-Reduced LED 

LED is a 64-bit block cipher designed for resource-constrained environments, 
proposed by Guo et al. at CHES 2011 0 - The two main variants of LED 
are LED-64 (which supports 64-bit keys) and LED-128 (which supports 128- 
bit keys). The design of LED can be viewed as a special case of iterated EM 
schemes: LED-64 is in fact an 8-step iterated EM schemq^ with one key and 
LED-128 is a 12-step iterated EM scheme with alternating keys Ki and K%. The 
inner permutations of LED are based on the AES design framework, however, 
since our attacks do not exploit any properties of these permutations, we do not 
specify them here and refer the reader to [l2| for further details. 

In the single-key model, the best attack published so far on reduced LED-64 
breaks 2 steps of this cipher m . For LED-128, the largest number of attacked 
steps was 6 (see fl6|'l. In this paper, we use our generic attacks in order to im- 
prove the data complexity of the attack on 6-step LED-128 from 2 59 to 2 45 , while 
keeping the time and memory complexities similar to the original attack. More 
significantly, we present the first single-key attacks which are faster than exhaus- 
tive search on 3-step LED-64 and on 8-step LED-128. The previous attacks on 
LED (which are in the single-key rather than in the related-key model) and our 
new attacks are summarized in Table [TJ Note in particular that our new attack 
on 8-step LED- 128 actually has a slightly better time complexity and requires 
about a thousand times less data than the best previous attack which could only 
be applied to 6 steps of LED-128, out of the full 12. 


Table 1 . Single-Key Attacks of Step-Reduced LED 


Reference 

Cipher 

Steps 

Time 

Data 

Memory 

m 

LED-64 

2 

2 56 

2 s CP 

2 11 

This paper 

LED-64 

3 

2 60.2 

2 49 KP 

2 60 

[13] 

LED- 128 

4 

2 112 

2 16 CP 

2 19 

[15] 

LED-128 

4 

2 96 

2 64 KP 

2 64 

[16] 

LED-128 

4 

2 96 

2 32 KP 

2 32 

[16] 

LED-128 

6 

2 124 ' 4 

2 59 KP 

2 59 

This paper 

LED-128 

6 

2 124.5 

2 45 KP 

2 60 

This paper 

LED-128 

8 

2 123.8 

2 49 KP 

2 60 


The data complexity is given in chosen plaintexts (CP), or in known plaintexts (KP). 


7 In the design of LED, the term “step” is used in order to describe what we refer to 
as a “round” of an iterated EM scheme. On the other hand, a “round” of LED is 
used in order to describe a smaller component of its internal permutation. Thus, in 
order to avoid confusion, we will use the term “step” in this section. 
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3.1 An Attack on 6 Steps of LED-128 

As was pointed out in 00 , it is easy to reduce 2r + 2-steps of LED-128 (with 
its alternating use of two keys) into an iterated EM scheme variant with one key 
by guessing K\ and combining consecutive pairs of permutations (along with the 
XOR’ed key between them) into a single known permutation. In particular, 0 
reduced 6-step LED-128 into a 2-step iterated EM, which was relatively easy 
to attack. Similarly, we guess K i, and for each guess, we partially encrypt and 
decrypt the given plaintext-ciphertext pairs and remain with a 2-step iterated 
EM scheme with a single key (A 2 ). Thus, we can apply our 2-step iterated EM 
attack (presented in Section liO) for each guess of K\. However, we note that 
the preprocessing phase of our 2RoundlKeyOpt attack should be executed for 
each guess of and it is thus now a part of the online algorithm of the attack 
on LED-128. Moreover, the algorithm can no longer be performed in streaming 
mode, as we need to reuse each plaintext-ciphertext pair for each guess of K\. 
The general framework of the algorithm is given below. 

1. Ask for the encryption of D arbitrary plaintexts and store them. 

2. For each value of K\: 

(a) Apply the 2RoundlKeyOpt attack (including the preprocessing steps) 
on the resultant scheme, with plaintext-ciphertext pairs ( P\(rrii ® K\), 
P% 1 {ci ® Ai)). Test each returned key using another pair ( rrij,Cj ). 

Using the parameters of our 2RoundlKeyOpt attack (presented in Section 
12. Ill , the expected data complexity of the attack is 2 45 known plaintexts and 
its memory complexity is 2 60 (required for preprocessing, which is now part of 
the online algorithm). We calculate the expected time complexity of the algo- 
rithm as follows: adding the preprocessing and online time complexities, the 
main procedure of the attack performed for each guess of K\ requires about 
2 60 - 1 + 2 60 « 2 611 evaluations of 4 out of the 6 permutations, which is equivalent 
to about 2 60 - 5 evaluations of the full scheme. Compared to this complexity, the 
partial encryption and decryption of each ( rrij , Cj ) pair, and the trial encryptions 
using ( rrij , Cj) (performed on average once per guess of K{) are negligible. Thus, 
the expected time complexity of the attack is about 2 64+60 - 5 = 2 124 - 5 , which is 
about 11 times better than exhaustive search. 


3.2 An Attack on 3 Steps of LED-64 

We can attack 3-step LED-64 by directly applying SRoundlKeyOpt attack with 
n = 64, presented in Section 12.21 Thus, the preprocessing phase has a time 
complexity of about 2 58 5 and memory complexity of 2 60 . The online algorithm 
has a memory complexity of 2 60 , data complexity of 2 49 known plaintexts and 
time complexity of 2 59 ' 6 . Since in this paper we consider the preprocessing time 
as part of the attack (i.e., we assume that we are trying to attack the scheme for 
the first time), the total time complexity of the algorithm is about 2 60 - 2 , which 
is about 14 times better than exhaustive search. 


Key Recovery Attacks on 3-round Even-Mansour 351 


3.3 An Attack on 8 Steps of LED-128 

We use the same framework of our 6 step attack on LED-128 in order to attack 8 
steps of LED-128 (shown in Figure|3]). Namely, we guess K-y , and for each guess, 
we partially encrypt and decrypt the given plaintext-ciphertext pairs and remain 
with a 3-step iterated EM scheme with a single key ( K 2 ). We then apply our 
ZRoundlKeyOpt attack (presented in Section I2.2|) for each guess of K\ . Thus, 
the memory complexity of the attack is 2 60 and its data complexity is 2 49 known 
plaintexts. We calculate the expected time complexity of the algorithm as follows: 
adding the preprocessing and online time complexities, the main procedure of 
the algorithm performed for each guess of Ki requires about 2 58 ' 5 + 2 59 - 6 « 2 60 2 
evaluations of 6 out of the 8 permutations, equivalent to about 2 59 - 8 evaluations 
of the full scheme. Thus, the expected time complexity of the attack is about 
264+59.8 _ 2 123 - 8 ) which is about 18 times better than exhaustive search. 



Fig. 3. 8-step LED-128 

4 Attacks on 2-Round Iterated Even-Mansour with 
Independent Keys 

The best known generic attack on 2-Round iterated EM with independent keys 
(see Figure ® is a MITM attack. This attack is described in the full version of 
this paper [7|] and it requires 2" memory and has a time complexity of about 
2n+i.6 f u p cipher evaluations. 


Fig. 4. A 2-round iterated EM with independent keys 


In this attack, we use a property of the permutation P % , which is shared by 
the keyed permutation Qi(Ki,K i+1 ,x) = Pi(x ® Ki) ® K i+1 for any value of 
K, and K i+i : these permutations have the same difference distribution table. 
In order to demonstrate this, consider an entry with the value of t in the dif- 
ference distribution table of Pi, and denote its input and output differences bv 
Ay and A?, respectively. Let us denote the t corresponding input-output pairqj 

8 In this paper, we consider unordered pairs, i.e., ((x,y), (u,v)) and ((w, v), ( x,y )) are 
considered the same pair. 
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by {(x\,y\), (xi © A 1 ,y 1 © A 2 )), . . . , ((x t ,y t ),(x t © A x ,y t © A 2 )). Then, the t 
input-output pairs ((zi ®Ki, yi © K 2 ), (xi® K x ® Ai,yi® K 2 ® A 2 )), . . . , (( x t © 
Ki,y t © K 2 ), ( xt ® K\ ® A\,yt ® K 2 ® A 2 )) correspond to the same entry in 
the difference distribution table of Qi (i.e., the entry with input and output 
differences Ai and A 2 , respectively). 

Using the property above, if we find an entry [ A 1 , A 2 \ in the difference distri- 
bution table of Pi with a large value, then we can use a similar attack to the one 
given in Sion 2-round iterated EM H in order to break the scheme. However, 
our main observation is that we can find such an entry by preprocessing the 
public function Pi, which does not need to admit any special property in order 
to attack the scheme. Thus, our attack adds a preprocessing algorithm to the 
online algorithm of the attack of [l5| (which assumes that we have an entry in 
the difference distribution table of Pi with a large value) . In addition (as we will 
see later), in the case of independent keys, the basic attack of 0 is not better 
than exhaustive search, and we will need to add another non-trivial component 
to this attack. The details of our unoptimized attack 2Round3Key Basic are 
given below, where Si, S 2 , D are parameters: 

Preprocessing: 

PR1. Choose an arbitrary input difference A\ ^ 0 and evaluate P\ on S-\ arbi- 
trary input pairs with input difference A\. For each pair ( x,Pi(x )), ( x © 
Ai,Pi{x® Ai)), store the output difference Pi(x)®Pi(x®Ai) in a sorted 
list, next to x. 

PR2. Traverse the sorted list and find the most common output difference A 2 
(if there are several options for A 2 , choose one arbitrarily). Keep only the 
entries of the list which correspond to pairs with the output difference of 
A 2 (assume that we have t such pairs). For each such entry, recalculate 
and store the full pair ( x , Pi(x)), (x © Ai, Pi(x © Ai)). 

PR3. Evaluate P 2 on S 2 arbitrary input pairs with input difference A 2 . For each 
pair ( y,P 2 (y )), ( y © A 2 ,P 2 (y © A 2 )), store the output difference P 2 (y) © 
P 2 (y © A 2 ) in a sorted list L 2 , next to y. 

Online: 

01. Ask for the encryption of D arbitrary input pairs with difference A i. 

02. For each pair of plaintext-ciphertext pairs (( mj,cj ), (to? = rrij © Ai, c?)): 

(a) Search for the output difference cj © c? in L 2 , (if there is no match, 
discard the pair and return to Step 02). 

(b) For each match (y, P 2 (y)), {y © A 2 ,P 2 {y © A 2 )), we have 2 candidates 
for K 3 : P 2 (y)®c\ and P 2 (y)® c? . We also have 2 1 candidates for Ky. the 
candidates x © mj and x®mj for each of the t values of a;. As each pair 
of values for Ki and K 3 suggests a value for K 2 , we have 4 1 suggestions 
of the full key to test using another plaintext-ciphertext pair. 

9 Although the attack of was previously applied to 2-round iterated EM with one 
key, it can be adapted to work for the case of independent keys. 
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Similarly to our analysis of random functions, assuming that Pi is a random 
permutation, then each entry in its difference distribution table is distributed 
according to the Poisson distribution [ItJ 0 This will allow us to easily determine 
the expected value of t and use it in order to analyze the expected complexity 
of our algorithm. 

The memory complexity of the preprocessing phase is max (Si, S 2 ), and its 
time complexity is 2 ■ Si evaluations of Pi and 2 ■ S 2 evaluations of P 2 , or Si + S 2 
evaluations of the full scheme. The memory complexity of the online algorithm is 
S 2 . Using the birthday paradox, out of the D plaintext-ciphertext pairs evaluated 
in the online phase, at least ( D ■ t)/ 2 n ~ 1 are expected to have a difference of A 2 
after Pi (note that we have 2 n ~ 1 unordered pairs with a given difference). Using 
the same argument, we expect that (D-t-S 2 )/2 2 ^ n ~^ of them will match the pairs 
evaluated for P 2 during proprocessing. Thus, we require that (D-t- S 2 )/2’ 2< ' n ^ 1 ' ) = 
l,or D = 2^ n ~ 1 '>/(t-S 2 ) in order to find the key with high probability. Without 
going into the details of the time complexity analysis, note that we are using 
only two plaintext-ciphertext pairs to filter the key suggestions, tested in Step 
02. (b). As we have 3n bits of key and 2 n bits of filtering, we need to test at 
least 2” keys in Step 02. (b), and thus the attack is not faster than the simple 
MITM attack on this scheme. 

4.1 A Time-Optimized Attack on 2-Round Iterated Even-Mansour 

In order to improve the attack, we need to add more filtering conditions, and 
thus we actually work on triplets, as described in the improved algorithm 
2Round3KeyOpt: 

Preprocessing: 

PR1. Choose an arbitrary input difference A\ / 0 and evaluate Pi on Si arbi- 
trary input pairs with input difference A\. For each pair (x,Pi(x)), (x ® 
Ai , Pi (x ® Ai)) , store the output difference Pi (x) ® Pi (a; ® Ai) in a sorted 
list, next to x. 

PR2. Traverse the sorted list and find the most common output difference A 2 
(if there are several options for A 2 , choose one arbitrarily). Keep only the 
entries of the list which correspond to pairs with the output difference of 
A 2 (assume that we have t such pairs). For each such entry recalculate 
and store the full pair (x, Pi(x)), ( x ® A\, Pi(x ® Ai)) in a list Li. 

PR3. Choose another non-zero input difference A\ . For each value x stored in 
Li, evaluate Pi an additional time to obtain the pair (x ® A( , Pi (x ® AO). 
Store the (total of) additional t output differences Pi(x) ® Pi (x ® A\) in 
a separate sorted list of differences, L\ . 

PR4. Evaluate P 2 on S 2 arbitrary input pairs with input difference A 2 . For each 
pair ( y , P 2 (y)), (y ® A 2 , P 2 (y ® A 2 )), store the output difference P 2 (y) © 
P 2 (y ® A 2 ) in a sorted list L 2 , next to y. 

10 However, we note that since we consider unordered pairs, then we have only 2 n_1 
possible pairs of a given difference, and each pair can attain (almost) all 2" output 
differences 
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Online: 

01. Ask for the encryption of D arbitrary input triplets of the form m, m © Z\-| 
and to ® A\ (for D arbitrary values of m). 

02. For each pair of plaintext-ciphertext pairs ((mj, cj), (rnj = rnj © A- t , cf )): 

(a) Search for the output difference cj ® cj in the list L 2 (if there is no 
match, discard the pair and return to Step 02). 

(b) For each match ( y , P 2 (yj), (y ® A 2 , P 2 (y ® A 2 j), compute the 2 candi- 
dates for K 3 : K' 3 = P 2 (y) ® cj and K" = P 2 (y) ® cj. 

(c) Denote the third plaintext-ciphertext pair in the triplet by (mf = rnj ffi 
A\ , cf). Compute y' = P 2 -1 (cf © K'a) and y" = P 2 ~ 1 (cf ® K"). 

(d) Search L\ for the four possibilities of the third difference obtained at 
this stage: y' ® y, y' ® A 2 ® y, y" ® y, y" ® A 2 @y (if there is no match, 
discard the pair and return to Step 02). 

(e) Test the 4 1 suggestions of the full key using (mf , cf ). If the test succeeds, 
return the key. 

The time and memory complexities of the preprocessing phase are similar to 
those of the 2Round3Key Basic attack (the additional t evaluations of Pi and t 
units of storage are negligible) . Using the calculation done for 2Round3K ey Basic, 
the online algorithm requires D = 2 2 ( n_1 )/(i • S 2 ) plaintext-ciphertext triplets. 
For each processed triplet, we expect to find a match in L 2 with probability S 2 /2 n . 
For each such matched triplet, we need to compute P 2 (y ) (in order to compute K 3 
and K'j) and evaluate P 3 _1 twice in order to compute y' and y " . Once we do so, the 
probability of a match in L\ in Step 02. (d) is proportional to t/ 2". This is a neg- 
ligible probability, and thus we can neglect the complexity of the trial encryptions 
in Step 02. (e). Thus, the online time complexity (without counting the data) is 
about 3 • D ■ S ‘2 /2 n = 0.75 • 2 n /t evaluations of P 2 , or 0.375 • 2 n /t evaluations of 
the full scheme. 

The data complexity of the attack is D triplets, or 3D chosen plaintexts. 
However, we can easily reduce it to 2D by requesting encryptions of structures 
containing the messages m, m®Z\i, m® A) and to® Ai® A' v Each such structure 
of 4 plaintexts contains two triplets which we can exploit, implying that the data 
complexity of the attack is indeed 2D. If we add the time to generate the data to 
the time complexity, we get that the total time complexity of the online attack 
is about 2D + 0.375 • 2 n /t evaluations of the full scheme. 

4.2 Applications to Full AES 2 

AES 2 is a 128-bit block cipher presented at Eurocrpyt 2012 The cipher is a 
2-round iterated EM construction, where each of the public permutations Pi and 
P 2 is based on an invocation of full AES-128 with a pre-fixed and publicly known 
key. The designers of the scheme claim that its security is 2 128 . However, the best 
attack known to the designers (as claimed in (BJ) is the MITM attack presented 
in Q, and based on our analysis, it has a slightly higher time complexity of 
3 • 2 128 w 2 129 - 6 and a memory complexity of 2 128 . 
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In order to attack AES 2 , we use our 2Round3KeyOpt attack with ,S'i = 2 124 
and S 2 = 2 125 - 4 . This implies that the memory complexities of both the pre- 
processing and online phases is 2 125 ' 4 . The time complexity of the preprocessing 
phase is Si + S 2 = 2 124 + 2 125 - 4 2 125 ' 9 evaluations of the full scheme. Using 

the formula (2" • A* • e~ x )/t\ with A = 2 124 /2 128 = 1/16 and t = 18, it is easy 
to check that we expect to find at least 10 entries in the difference distribution 
table with a value of 18 (we need only one). Plugging in these values into the 
formula D = 2 2 ( n_1 )/(t ■ S 2 ), we obtain D rs 2 124 4 , implying that the data 
complexity of the attack is 2 125 4 chosen plaintexts. The time complexity of the 
online attack is 2D + 0.375 -2"/t « 2 125 6 , and adding the preprocessing time, the 
total time complexity of the algorithm is about 2 125 - 9 + 2 125 - 6 « 2 126 - 8 . This is 
better than the 2 129 fJ time complexity of the MITM attack by a factor of about 
7, and clearly violates the 128-bit security claimed for AES 2 in [B|. We also note 
that the memory complexity is improved from 2 128 to about 2 125 ' 4 , however the 
data complexity is greatly increased to 2 125,4 . 

5 Conclusions 

In this paper we considered several schemes which are based on the iterated 
Even-Mansour scheme, and improved their best known attacks. For the recom- 
mended values of n our attacks are between 7 and 20 times faster than exhaustive 
search, but they differ from other improvements of exhaustive search since their 
improvement factor is about n/log(n), which increases to infinity as n grows. 
In particular, we described the first attack on the full AES 2 , and improved the 
number of steps which can be attacked in the well known LED- 128 block cipher 
from 6 to 8. Even though our attacks are not likely to be practically signifi- 
cant, they indicate that block ciphers based on the EM scheme with one key 
should have at least 4 rounds, regardless of how strong we make the internal 
permutations. 
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Abstract. In this paper, we reveal a fundamental property of block 
ciphers: There can exist linear approximations such that their biases e 
are deterministically invariant under key difference. This behaviour is 
highly unlikely to occur in idealized ciphers but persists, for instance, in 
5-round AES. Interestingly, the property of key difference invariant bias 
is independent of the bias value e itself and only depends on the form of 
linear characteristics comprising the linear approximation in question as 
well as on the key schedule of the cipher. 

We propose a statistical distinguisher for this property and turn it 
into an key recovery. As an illustration, we apply our novel cryptanalytic 
technique to mount related-key attacks on two recent block ciphers — 
LBlock and TWINE. In these cases, we break 2 and 3 more rounds, 
respectively, than the best previous attacks. 

Keywords: block ciphers, key difference invariant bias, linear crypt- 
analysis, linear hull, key-alternating ciphers, LBlock, TWINE. 


1 Introduction 

1.1 Linear Cryptanalysis, Linear Approximations, and Bias 

Linear cryptanalysis is a central and indispensable attack on block ciphers. Hav- 
ing been proposed as early as in 1992 E3H2S], it forms an established research 
field within symmetric-key cryptology. Since then, many interesting results have 
been obtained in the area, among others including correlation matrices by Dae- 
men et al. [5], multiple linear cryptanalysis by Kaliski and Robshaw DU. linear 
hull effect by Nyberg [23] , multidimensional cryptanalysis by Hermelin et al. [13] , 
comprehensive bounds on linear properties by Keliher and Sui [13], as well as 
success probability estimations by Selguk [35] . 

The basis of linear cryptanalysis is a linear approximation of a function /. 
If the linear approximation holds with probability 1/2 + e, e is called its bias. 
A linear approximation can comprise numerous linear characteristics 6, each 
contributing their linear characteristic bias to the linear approximation bias e. 
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There are essentially two standard approaches to deal with the key-dependency 
of these biases: they are either averaged over all keys or evaluated for a fixed 
key. Both cases have been studied in great detail and these approaches have 
turned out to be very fruitful: While the average behaviour of the bias is vital 
to the foundations of linear cryptanalysis and the demonstration of the linear 
hull effect, Murphy has demonstrated [27] that there can be keys for which the 
linear distinguisher might not apply. The latter observation is more inline with 
the fixed-key correlation-matrix approach, which also, among others, has lead 
to zero-correlation attacks by Bogdanov et al. [3HS] and improved linear attacks 
on PRESENT by Cho [5] . 

Apart from the average case and the fixed-key case, recently, Abdelraheem 
et al. [T] have managed to compute the distribution of linear characteristic bias 
for several interesting examples. Moreover, there has been quite some interest 
towards deducing key information from the value of the bias [71 1281130] . Kim [19] 
studies the combined related-key linear-differential attacks on block ciphers. In- 
terestingly, a linear-hull version of Matsui’s Algorithm 1 by Rock and Nyberg [32] 
uses the fact that, in some ciphers, the linear characteristic biases eg are the same 
for different keys. 

At the same time, much less is known about the even more fundamental 
question of how the bias e of the entire linear approximation behaves under a 
change of key. This is not least due to the fact that the entire linear hull is 
notoriously difficult to analyze for the immense number of linear characteristics 
6 comprising it. In this paper, we tackle this problem and reveal a property for 
many block ciphers, namely, that the bias s of a linear approximation can be 
actually invariant under the modification of the key. 

1.2 Our Contributions 

The contributions of this paper are as follows. 


Bias Invariant under Key Difference in Iterative Block Ciphers. We 

investigate the bias of a linear approximation in key- alternating ciphers (itera- 
tive SPN ciphers with XOR addition of subkeys) under a change of the key. By 
looking at the composition of the fixed-key linear hull from individual character- 
istics, we derive a sufficient condition on the keys and linear approximations such 
that the bias remains unaffected by a change of key. The class of key-alternating 
ciphers is already broad enough to include AES, most of the other SPN ciphers, 
and some Feistel ciphers. After recalling some background on linear cryptanalysis 
in Section [2] we describe these findings in Section |3] 


An Instructive Example with AES. With our technique, the key difference 
invariant bias property is easy to construct over (a part of) susceptible ciphers 
since it mainly depends on the differential diffusion in the key schedule and on 
the linear diffusion in the data transform of a cipher. We use AES to show how 
the property can be derived. Namely, we demonstrate a key difference invariant 
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bias property holding deterministically over 5 rounds of the original AES-256. 
This serves as a pedagogical illustration. See Section 13.31 

Statistical Distinguisher and Generic Key Recovery. The probability to 
have the key difference invariant bias property in an idealized block cipher with 
block size n, is about -A=2^ . This forms the basis for a statistical distinguisher 
that can be used for key recovery. Here, we use the fact that the key difference 
invariant bias property is actually truncated, i.e., there are many linear approx- 
imations with key difference invariant bias in most susceptible ciphers. In our 
distinguisher, for two keys, we compute the sample biases of a set of approxima- 
tions with this property (using the part of the plaintext-ciphertext pairs available 
to the adversary) and test their collective proximity. We demonstrate that it is 
possible to efficiently distinguish this from an idealized cipher, under some basic 
independency assumptions. The distinguisher can be used for hash functions and 
block ciphers. In the related-key setting, we propose a key recovery procedure 
for block ciphers which is similar to Matsui’s Algorithm 2. These techniques are 
given in Section 01 

Applications to Block Ciphers LBlock and TWINE. As an illustration, 
we apply our new cryptanalytic technique of key difference invariant bias to the 
recently proposed block ciphers LBlock [35] and TWINE [37] . LBlock was de- 
signed by Wu et al. and presented in ACNS 2011. Its state and key sizes are 
64 and 80 bits respectively. LBlock has received the attention of many cryp- 
tographers and various attacks have been published so far on some reduced 
versions [16ll20H22ll26ll33ll34] . The best attack breaks 22 rounds of the cipher. 
TWINE is a block cipher proposed in SAC 2012 by Suzaki et al. that is oper- 
ating on a 64-bit state that is parameterized by keys of length 80 or 128 bits. 
The total number of rounds is 36. The best known attack on TWINE-128, is an 
impossible differential attack given in m, that breaks 24 rounds of the cipher. 

We identify key difference invariant bias properties over 16 rounds of LBlock 
and 17 rounds of TWINE-128. This allows us to attack 24-round LBlock and 
27-round TWINE-128 in the classical related-key model with differences in the 
user-supplied master keys. Thus, our attacks improve upon the state-of-the-art 
cryptanalysis for both LBlock and TWINE by breaking 2 and 3 more rounds, 
respectively, than the best previous attacks. Our cryptanalysis is provided in 
Sections [5] and Section [6] 

2 Preliminaries 

2.1 Key- Alternating Ciphers 

A block cipher operating on n-bit blocks with a fc-bit key can be seen as a 
subset of cardinality 2 k of the set of all 2”! permutations over the space of 
n-bit strings. In an idealized block cipher, this subset is randomly chosen. In 
all practical settings, however, one is concerned with efficiently implementable 
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block ciphers. So all block ciphers used in practice contain at their core the 
iterative application of r similar invertible transformations (called rounds). Key- 
alternating block ciphers form a special but important subset of the modern block 
ciphers (see Figure [T]): 

Definition 1 (Key-alternating block cipher [9j). Let each round i, 1 < 
i < r, of a block cipher have its own n-bit subkey ki. This block cipher is key- 
alternating, if the key material in round i is introduced by XORing the subkey ki 
to the state at the end of the round. Additionally, the subkey ko is XORed with 
the plaintext before the first round. 

The r + 1 round subkeys fco , k \ , . . . , k r - i , k r build the expanded key K (of length 
n(r + 1) bits) which is derived from the user-supplied key k using a key-schedule 
algorithm ip. Numerous popular and widely used block ciphers belong to the 
class of key-alternating block ciphers. Among others, almost all SPNs (including 
AES) and some Feistel ciphers are key-alternating [Tl] . 



Fig. 1. Key-alternating cipher 


2.2 Linear Approximations and Bias 

We briefly recall the concepts of linear approximations and bias. We denote the 
scalar product of binary vectors by afx = 0" =1 OiX % . Linear cryptanalysis is 
based on linear approximations determined by input mask a and output mask 
b. A linear approximation (a, b) of a vectorial function / has a bias defined by 

4,b = ? r{&7(*) ® «**} ~ 1/2 

to which we also refer simply as e if its assignment to function and linear ap- 
proximation is clear from the context. We call a linear approximation trivial if 
both a and b are zero. Otherwise, with both a ^ 0 and b ^ 0, it is non-trivial. 


2.3 Linear Characteristics and Linear Hulls 

A linear approximation (a, b) of an iterative block cipher (e.g. a key-alternating 
block cipher of Definition [TJ) is called a linear hull in [22] ■ The linear hull con- 
tains all possible sequences of the linear approximations over individual rounds, 
with input mask a and output mask b. These sequences are called linear char- 
acteristics which we denote by 6. Now we recall the relations between the bias 
of a linear characteristic and the bias of the entire linear hull it belongs to, for 
key-alternating block ciphers. 
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Given a linear hull (a, b), a linear characteristic 9 is the concatenation of an 
input mask a = Oo before the first round, an output mask b = 9 r after the last 
round, and r — 1 intermediate masks Oi between rounds i — 1 and i: 

9 = {9 q ,9 (1) 

Thus, each linear characteristic consists of n{r + 1) bits (cf. the length of the 
expanded key K). The bias eg of the linear characteristic 9 is defined as the 
scaled product of the individual biases eg 4 _ 1 ,e 1 over each round: 

e e = f * 1 n 


In a key-alternating cipher, only the sign of eg depends on the key value, while 
the absolute bias value \eg\ remains exactly the same for all keys. As a reference 
point, we denote by dg G {0, 1} the sign of the linear characteristic bias with 
expanded key K = 0: 

eg[0] = (-l) d °\sg\. 

Now we formulate the following central proposition that deterministically con- 
nects the linear approximation bias with the individual linear characteristic bi- 
ases through a fixed key value: 

Proposition 1 ([9j Subsection 7.9.2]). For a key- alternating block cipher, 
the bias e of a non-trivial linear hull (a, b) is 

e= Y, 

6-.0o=a,Or=b 

We will be relying on Proposition Q] in the next section to determine when e is 
invariant under a change of key. 


3 Towards Bias Invariant under Key Difference 

For a non-trivial linear hull (a, b) of a block cipher, let e and e' be two biases 
under two distinct keys k and «/, respectively. Now we consider when e = e' 
with k ^ k', that is, when the bias is invariant under a change of key. 


3.1 Key Difference Invariant Bias in Key- Alternating Ciphers 

In a key-alternating block cipher, let K and K' be the expanded keys corre- 
sponding to two user-supplied keys n and k' , K = ip(n) and K' = <p(k') for 
key schedule ip as in Section [2] such that K' = K ® A where the difference A 
describes a connection between K and K' . We will now derive a condition on A 
and 9 such that the value of linear approximation bias e = s' is unaffected by 
the key change k ^ k'. 
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In a key-alternating cipher, the bias for an expanded key can be computed 
due to Proposition [TJ That is: 

£ = (— l) ds+e ‘ K M and s' = {-l) de+etK '\ee\. (2) 

0:0 o =a,0 r =b 0:0 o =a,O r =b 

We want to attain the equality e = e' , so we study when both sides of © are 
equal: One can observe that the only part that is different are the signs of the 
individual linear characteristic biases. Therefore, the equation will hold if all the 
signs are equal, that is, if the following is satisfied for each 9: 

d 9 + 9 t K = d e + 9 t K' . (3) 

Since dg is the same, d3]) holds if and only if 6 t (K © K') = 0. Recalling that we 
denote K © K' by A, we have the following statement: 

Theorem 1 (Key difference invariant bias for key-alternating ciphers) . 

Let {a, b) be a non-trivial linear hull of a key-alternating block cipher. Its biases 
e for expanded key K and e' for expanded key K' with K = K' ® A have exactly 
equal values e = e' , if 9* A = 0 for each linear characteristic 6 of the linear hull 
(a, b ) with sg^O. 

Theorem [T] yields a sufficient condition on the relation between the masks of 
linear characteristics and the expanded key difference for the key difference in- 
variant bias property to hold. We will deal with this in the next subsection. 

3.2 Sufficient Condition for Key Difference Invariant Bias 

For a fixed pair of keys K and K' , the difference A connecting them is also 
constant. At the same time, the linear masks 9 will be different for each linear 
characteristic in the given linear hull (a, b). Thus, A can be seen as a linear mask 
itself on 9 that chooses certain positions in characteristics 9, cf. O- 

In a linear characteristic 9, we address each of the n(r + 1) bits by 9(j), 
j = 1, . . . ,n(r + 1). We focus on bit positions 9(j) in linear characteristics 9 
such that 9(j) = 0 for all 9 with eg ^ 0. We call such positions zero positions. 
Otherwise, a position is called a nonzero position. 

Now we are ready to formulate a more explicit sufficient condition for deter- 
ministically keeping 9 t A = 0: 

Condition 1 (Sufficient condition for key difference invariant bias). For 

a fixed non-trivial linear approximation (a, b ) of a key- alternating block cipher, 
the relation between a pair of the user-supplied keys k and n' is such that the 
expanded key difference A = K ® K' chooses an arbitrary number of zero posi- 
tions and no nonzero positions in the linear characteristics 9 of the linear hull, 
with £0 7^ 0. 

Once Condition [T| is fulfilled, Theorem [T| becomes applicable with 9 t A = 0 and 
yields £ = s' . 

In the next subsection, for instructive and pedagogical purposes, we show one 
example of key difference invariant bias property using Condition [I] with AES. 
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3.3 The Instructive Example of AES 


Here we provide an illustration of the key difference invariant bias property 
for AES. The goal of this section is mainly pedagogical and we simply aim 
to show how such a property can be derived in practice. We demonstrate a 
key difference invariant bias property for reduced-round AES-256. We provide 
an example where Condition [T] is satisfied, which in turn makes Theorem Q] 
applicable. 

For AES-256, let the two user-supplied 32-byte keys be connected by 


'0 0 0 0 0 0 50 ' 
00000000 
00000000 
00000000 


(4) 


with the first byte 5 ^ 0 of the 7-th column being the only non-zero byte. 
Furthermore, let the (truncated) linear approximation be defined by the 16-byte 
input /output masks: 


'a 0 0 O' 


'6 0 0 O' 

0 0 0 0 

and b = 

0 0 0 0 

0 0 0 0 

0 0 0 0 

.0 0 0 0. 


.0 0 0 0.. 


The masks define a linear hull for any non-zero byte values a and b. We show 
that the key difference dU and the linear hulls ([S]) result in the key difference 
invariant bias property for 5 rounds of AES-256. 

The AES data transform diffuses a single-byte input mask to the full state only 
after two rounds. Analogously, a single-byte output mask applies to the full state 
only after three rounds of backward computations. This fact makes Condition Q] 
applicable to AES. The byte positions involved into the propagation of linear 
patterns over 5 rounds of AES with a and b above as input/output masks are 
shown as E in Figure^ Correspondingly, byte positions not involved are depicted 
as □. Since AddRoundKey is addition with constant and MixColumns is an 
affine operation, one can exchange their order under the suitable modification 
of the subkey value. In this case, ShiftRows is followed directly by the modified 
AddRoundKey (AK’) which is the case in the last round of Figure [21 

We track the propagation of the difference in the user-supplied key to the 
expanded key difference which is shown as X in Figured] k(Bk' specified above 
satisfies Condition!!] In Figured] all non-zero bytes X of A are only concentrated 
in impossible positions d of 9 and do not interfere with I. 

Thus, e = s’ is fulfilled with probability 1 and the key difference invariant bias 
property holds deterministically. 


3.4 Key Difference Invariant Bias and Idealized Cipher 

In random block ciphers, the bias e under a fixed key is the bias for a fixed 
randomly drawn permutation. Using m Theorem 4.7], one can demostrate that 
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Fig. 2. Key difference invariant bias for 5 rounds of AES-256 


the probability for the biases with two different keys to be exactly equal is 
Pr{e = e'\k 7 ^ « ^=2^ for block sizes n > 5. Thus, the key difference 

invariant property for idealized block ciphers is a rare event, which yields a 
distinguisher for susceptible ciphers outlined in the next section. 

4 Statistical Distinguisher and Key Recovery with Key 
Difference Invariant Bias 

In this section, we present the statistical distinguisher based on the key differ- 
ence invariant bias for an n-bit block cipher, followed by a generic key recovery 
procedure. 

4.1 Distinguisher 

In the distinguisher, our aim is to tell if we deal with the target cipher featur- 
ing the property or an idealized cipher. The setup for the statistical test is as 
follows. Suppose that we are given N plaintext-ciphertext pairs and A linear ap- 
proximations under a pair of expanded keys ( K , K') connected by A in the way 
described in Condition [T] Then, for each one of these linear approximations we 
compute and store in counters Si and S ' , 1 < i < A, which account for the num- 
ber of times these approximations are satisfied for K and K' with the N texts. 
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The counters Si and suggest empirical biases £* = ^ | and e' = 

respectively. We evaluate consequently the following statistic s: 


2 


s = 



2 


We expect the statistic s to be lower for the target cipher, featuring the 
key difference invariant bias property, than for a random cipher. As we aim to 
perform key-recovery with this test, we will derive the distribution of this statistic 
for the right key guess (assuming the target structure) and for the wrong key 
guess (assuming a random cipher). 

Right Key Guess. The empirical bias value £i for the *-th linear approximation 
approximately follows the normal distribution with the exact value of bias £; as 
mean and variance 1 /AN with good precision (cf., e.g., [21135]) for sufficiently 
large N: 

In this case, the following proposition holds: 

Proposition 2 (Distribution of statistic s for the right key). Consider 
A nontrivial linear approximations for a block cipher under a pair of expanded 
keys (K, K') connected by A conforming to Condition 0 If N is the number 
of known plaintext- ciphertext pairs, Si and are the numbers of times such a 
linear approximation is fulfilled for K and K' , respectively, i £ {1, . . . , A}, and 
A is high enough, then, assuming the counters Si and S[ are all independent, the 
following approximate distribution holds for sufficiently large N and n: 



Proof. See the full version of this paper 0 . 

Wrong Key Guess. In this case, we base upon the hypothesis that for a wrong 
key, we deal with a random cipher consisting of permutations drawn at random. 
Then, each of the values £i can be approximated by a normal distribution with 
mean and variance 1 /AN for sufficiently large N: 


ti ~ Affa, 1/4JV) with ei ~ Af{ 0, l/2”+ 2 ), 


where £* is the exact value of the bias which is itself distributed over n-bit 
random permutations for n > 5 mm- 

Then we have then the following proposition for the distribution of the 
statistic s: 


Proposition 3 (Distribution of statistic s for the wrong key). Consider 
A nontrivial linear approximations for two randomly drawn permutations. If N is 
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the number of known plaintext- ciphertext pairs, Si and S \ are the numbers of times 
a linear approximation is fulfilled for these two permutations, i £ { 1 , . . . , A}, and 
A is high enough, then, assuming the independency of all Si and SI, the following 
approximate distribution holds for sufficiently large N and n: 



Proof. See the full version of this paper [ 5 ] . 

Data Complexity of Distinguisher. In the two above cases, we have seen 
that the statistic s will follow, depending on if we deal with the right or the wrong 
key, two different normal distributions. In the first case, it follows the normal dis- 
tribution with mean Mo = 5 F and variance af = while in the second case 
it follows the normal distribution with mean Mi = 217 + 2^ and variance of = 
2^7 + 2 isn+i + TTgrr • It has to be decided if the obtained statistic s is from .A/"(mq , of) 
or from Af(p 1 , erf ). To do that, we perform a test that compares the statistic s to 
a threshold value r. This test says that s belongs to Af(p o> Oq) if s < r and that s 
belongs to Af(pi, erf), otherwise. 

As in any statistical test, one has to deal with two types of error probabilities 
here. The first one - denoted by Qo - is the probability to reject the right key, 
whereas the second one - denoted by a 1 - is the probability to accept a wrong 
key. The decision threshold used is r = Mo + o'oQi-ao = Mi — a i Qi-ai, where 
qi~ ai and q-\- 0 . 0 are the quantiles of the standard normal distribution Af( 0 , 1 ). 
This simple test is visualized in Figure |31 



Fig. 3. Statistical test for key difference invariant bias in key recovery 

It is well known m that in order for such a test to have error probabilities 
of at most ao and au, the parameters mo, er'o, Mi and a\ should be such that 
qi- ai (Ji + qi- ao a 0 = |mi - Mo|- 

Now, using Proposition [ 2 ] and Proposition[ 3 l we obtain the following equation 
that determines the amount of data needed by the distinguisher: 


2 n+0.5 


(91-ao + <?l-ai) ■ 


VX— qi- ai V2 


( 6 ) 
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4.2 How to Recover the Key with Key Difference Invariant Bias 

Here, we describe a generic key recovery attack approach that can be applied 
to block ciphers for which a key difference invariant bias property for r rounds 
has been identified. This procedure is described in Algorithm [lj We will feed 
this algorithm with the related key differential paths that are going to be used 
for the attack. Other entries to the algorithm will be the number of rounds of 
the distinguisher r, the number of rounds r top that we are going to append at 
the top of the distinguisher and the number of rounds rhot that we are going 
to add at the bottom of the distinguisher. In Algorithm Q] V[x\ and V'[x'} are 
the counters containing the number of times the partial state values x and x' 
(values corresponding to non-zero mask of linear approximations) occur for N 
plaintext-ciphertext pairs under the key pair. 


Algorithm 1. Generic Attack Procedure 

Require: A set of linear approximations (a, b) and master key difference <5 = k ® k' 
with the key difference invariant bias property holding. 

1: for all related-key differential paths with a difference 5 on the master-key do 
2: Collect N plaintext-ciphertext pairs ( P , C) under a key k. 

3: Collect N plaintext-ciphertext pairs (P 1 , C') under «' = k ® 8. 

4: Partially encrypt rtop rounds and partially decrypt riot rounds, obtain partial 

state values x and x' covered by the input /output masks of (a, b) and compute 
V[x ] and V'/r'] (number of times these partial state values occur). 

5: Allocate a counter s. 

6: for all linear approximations (a, b ) do 

7: Allocate counters S and S' and set them to zero. 

8: for all values of x and x' do 

9: if the linear approximation holds then 

10: Add V[x\ and V[x'] to S and S', respectively. 

11: end if 

12: end for 

13: Compute s = s + |^(^ — |) — | j j . 

14: end for 

15: if s < t then 

16: The guessed subkey is a possible subkey value. 

17: Check exhaustively the remaining keys against several plaintext-ciphertext 

pairs. 

18: end if 

19: end for 

20: return encryption key. 


5 Attack on 24-round LBlock 

LBlock is a lightweight block cipher presented at ACNS 2011 by Wu and Zhang 
[55] . It uses 64-bit block and 80-bit key and is based on a modified 32-round 
Feistel structure. Its description is provided in the full version of this paper [5] . 
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5.1 Previous Cryptanalysis 

Despite its recent proposal, LBlock has already been extensively analyzed. For ex- 
ample, impossible differential attacks have been mounted in the single-key model 
[16l[2Tl[39] as well as attacks in the related-key model [26]. A related-key trun- 
cated differential attack on 22-round LBlock was given in [22] . Some other results 
concern integral cryptanalysis [2U1I551IM1I55] . A zero-correlation linear attack was 
equally mounted against 22 rounds of LBlock [3B]. Finally, biclique attacks fTTHO] 
provide only a small gain against exhaustive search. So the currently best non- 
exhaustive attacks against LBlock can break at most 22 rounds. 

In this paper, we propose an attack on 24 rounds of LBlock. Our results are 
summarized and compared to previous cryptanalysis in Table [TJ 


Table 1. Summary of attacks on LBlock 


Model 

Attack 

#Rounds 

#keys 

Data per key 

Time 

Memory 

Ref. 

SK 

Imp. Diff 

20 

1 

2“ CP 

2' 2 -' 

2 m 

m 


Imp. Diff 

21 

1 

2 62 5 CP 

2 78.7 

2 55.5 

ED 


Imp. Diff 

21 

1 

2 63 CP 

2 69.5 

2 75 

EE] 


Imp. Diff 

22 

1 

2 58 CP 

2 79.28 

2 76 

EE] 


Integral 

20 

1 

2 63 7 CP 

2 63.7 

N/A 

39 


Integral 

20 

1 

2 636 cp 

2 39.6 

2 35 

m 


Integral 

22 

1 

2 616 C p 

2 71 ' 2 

N/A 

m 


Integral 

21 

1 

2 616 C p 

2 54 ' 1 

2 51.S8 

GS] 


Integral 

22 

1 

2 el CP 

2 70 

2 63 

34 


Zero-Correlation 

22 

1 

2 64 KP 

2 70.54 

2 64 

36 


Zero-Correlation 

22 

1 

2 62 1 KP 

2 71 ' 27 

2 64 

m 


Zero-Correlation 

22 

1 

2 60 KP 

2 79 

2 64 

m 

RK 

Imp. Diff 

22 

8 

2 4 'RKCP 

2 m 

N/A 

ES] 


Differential 

22 

2 

2 63 1 RKCP 

2 67 

N/A 

[22] 


Key Diff Inv Bias 

24 

32 

2 62 29 RKKP 

2 74.59 

2 61 

Here 


Key Diff Inv Bias 

24 

32 

2 62 95 RKKP 

2 70.67 

2 61 

Here 


5.2 Linear Approximations with Key Difference Invariant Bias for 
LBlock 

We start by presenting the linear approximations with key difference invariant 
bias under two keys related by a difference on a single nibble of the master key. 
These linear approximations depicted in Figure0] hold for 16 rounds (from round 
5 to round 20) under the related-key differential paths depicted in the full version 
of this paper [5] . The input mask of the 5-th round is ( 0 0 0 0 a 0 0 0 0 0 0 0 0 0 0 0 ) and 
the output mask of the 20-th round is (000000000/3000000), a 0, /3 ^ 0. There 
are in total (2 4 — 1) • (2 4 — 1) « 2 7 - 81 such linear approximations. 

We can see from Figure[3]that the relations F r ■ AK r = 0, for 5 < r < 20 hold 
for all the related- key differential paths listed in the full version of this paper [2] . 
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00000000 Output Mask > 0^000000 


r r , 5 < r < 20: input mask value for the S-boxes in round r. 

AK r ,5 < r < 20: the subkey difference in round r. 

In masks, ‘O’, ‘1’ and zero, nonzero and arbitrary mask for a nibble, resp. 

In differences, ‘0’ , ‘1’ and zero, nonzero and arbitrary difference for a nibble, resp. 


Fig. 4. 16-round linear approximations with key difference invariant bias for LBlock 
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Therefore Condition [I] is satisfied, so the linear approximations in Figure 0] have 
a key difference invariant bias under the related-key differential paths listed in 
the full version of this paper [2] . 

The related-key differential paths that we used for our attack are presented 
in the full version of this paper [5] . 


5.3 Key Recovery for 24-round LBlock 

The 16-round linear approximations with key difference invariant bias that we 
used for our attack start before round 5 and end after round 20. The initial 
four rounds, round 1 to round 4, are added before the linear approximations 
and the final four rounds, round 21 to round 24, are appended after the linear 
approximations. The details of this stage, and the nibbles to be computed in the 
initial and the final four rounds are shown in the full version of this paper [2]. 
For this attack, r = 16, rt op = 4 and ri )0 t = 4. These elements will be input to 
Algorithm [TJ 


Attack Procedure for 24-round LBlock. The attack for LBlock will follow 
the attack procedure described in Algorithm [TJ For this reason the Steps 2 and 3 
of Algorithm [T] do not have to be executed for every path of Step 1. The Step 4 
of Algorithm [T] for LBlock is composed itself of 14 consecutive steps. The details 
of Step 4 are presented in the full version of this paper [2] . 

After proceeding from Step 5 to Step 15, we obtain the counter s containing 
the x 2 statistics for the subkey guess. The right value of guessed 53-bit subkey is 
likely to be among the candidates with the statistic s lower than or equal to the 
threshold r = qi- ao + 277- All cipher keys it is compatible with are tested 
exhaustively against a maximum of 2 plaintext-ciphertext pairs. 

Complexity Estimation. We start by evaluating the complexity of Step 4. 
From Step 4.1 to Step 4.14, the time complexity is T\ = IV • 2 4 • 2 + 2 60 ■ 2 8 • 2 + 

256 . 2 ^ . 2 -j- 2 52 • 2^ 8 • 2 + 2 48 • 2^ • 2 + 2 44 • 2 2 ^ • 2 2 4 ^ • 2 2 ^ • 2 -f- 2 88 • 2 2 ^ • 2 + 2 82 • 2 88 • 2 + 

228 . 2 3 7 . 2 +2 24 • 2 41 • 2 + 2 20 • 2 45 • 2 + 2 16 • 2 49 • 2 +2 12 • 2 53 • 2 = JV-2 5 +2-2 69 -|-ll-2 66 . 

We will compute N by using Equation ©, after choosing the values of ao 
and a\. Here, the number of linear approximations is A = 2 7 ' 81 and n = 64. 
Different choices of an and a\ will provide a time-complexity trade-off. We start 
by choosing some concrete values for ao and a\ that lead to an optimized time 
complexity. By setting ao = 2 -2,7 and a\ = 2 _8 5 , we have qi- ao ~ 1-02 and 
Qi-on ~ 2.77. In this way N pa 2 6295 (Note that the same N (P, C ) pairs or 
N ( P',C ' ) pairs can be reused for different related-key differential paths under 
the condition that remains the same.) and the threshold value gets 

r Pa 2 -55 ' 02 . Then, T\ fa 2 70 - 95 times of | round encryption which is equivalent 
to 2 63 37 times of 24-round encryptions. Note that the time complexity of the 
procedure described in Steps 6~14 is negligible. Under each related-key differen- 
tial path, the value of Ki4~i7 is already known, so the time complexity of Steps 
16-19 is about 2 76 • 2 -8 ' 5 = 2 67 5 times of 24-round encryption. Therefore, the 
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total complexity from Step 2 to Step 18 is about 2 67 ' 58 encryptions. After pro- 
ceeding from Step 2 to Step 18, if we can not succeed, this means that the value 
of the right key does not belong to the values corresponding to the related-key 
differential path tested. We can then use another related-key differential path 
to proceed the above attack. All possible values of the master key bits K 4 ~ 2 i 
are covered by the related-key differential paths, so we could always find the 
right key where in the worst case, all the related-key differential paths have to 
be tested. So the expected time complexity of our attack on 24-round LBlock is 
about 2 67 58 • [1 + (1- i) + • • • + (1- if)] « 2 70 67 24-round encryptions. The data 
complexity is 2 62 - 95 known plaintexts under each master key, while 2 60 • 2 = 2 61 
bytes of memory are required to store the counters. 

Another possible choice of «q and an can lead to a different time-data complex- 
ity trade-off. For example, if we set ao = 2 -2 ' 7 and a\ = 2 -4 - 5 , then qi- ao « 1.01 
and gi-ai ~ 1.70, we get N « 2 62,29 . For these parameters the expected time 
complexity is about 2 74 " 59 encryptions and the expected data complexity is 2 62 ' 29 
known plaintexts for each master key. The memory requirements are the same 
as in the previous attack. 

Other possible time-data trade-offs with /3 0 = 2 -2 - 7 for the attack on LBlock 
can be visualized in Figure [6] 

6 Attack on 27-round TWINE-128 

TWINE is a lightweight block cipher proposed by Suzaki, Minematsu, Morioka 
and Kobayashi in [37] . Its structure is based on a modified Type-2 generalized 
Feistel scheme. The cipher’s description is given in the full version of this paper. 

6.1 Previous Cryptanalysis 

In the original proposal of TWINE [37], the authors analyze the resistance of 
TWINE against various types of attacks, such as impossible differential and 
saturation attacks. The best analysis in this proposal is an impossible differential 
attack against 23 rounds of TWINE-80 and against 24 rounds of TWINE-128. 
Moreover, biclique attacks have been mounted in m for both full-round versions 
of TWINE, but the time complexity of these attacks is only marginally lower 
than exhaustive search. 


6.2 Linear Approximations with Key Difference Invariant Bias for 
TWINE- 128 

We present 17-round (from round 6 to round 22) linear approximations with key 
difference invariant bias under related-key differential paths for TWINE-128 in 
Figure[5] In our attack, the input mask of the 6-th round is OOOOOOOOOOOaOOO and 
the output mask of the 22-th round is 0000000/3000000000, a,/3 ^ 0. Thus, there 
are 15 * 15 w 2 7 ' 81 such linear approximations, exactly as in the case of LBlock. 
We start by describing the related-key truncated differential path that we use 
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in our attack. This differential path was found by considering only differences 
in only one nibble of the master key and by searching exhaustively over all such 
configurations. 

This path is described in the full version of this paper [2]. More precisely, 
we consider a difference equal to 1 in the 22nd nibble of the master key. This 
differential path covers all the possible key values and is sufficient to recover the 
right key value. Prom Figure 0 we can see that r r ■ AK r = 0, 6 < r < 22 (where 
K r and AK r denote the subkey value and the subkey difference for the round r 
respectively) and thus Condition Q] is satisfied. 


6.3 Key Recovery for 27-round TWINE-128 

We utilize the 17-round distinguisher in Figure 0 to attack 27 rounds of TWINE- 
128. The initial five rounds from round 1 to round 5 are added before the distin- 
guisher and the final five rounds from round 23 to round 27 are appended after 
the distinguisher, as shown in the full version of this paper. In such a way, the 
first 27 rounds of TWINE-128 are covered. The attack is proceeded by following 
Algorithm [T] The parameters are r = 17, r top = 5, rf mt = 5, see the full version 
of this paper. 

After proceeding from Step 5 to Step 15, we obtain the counter s containing 
the x 2 statistics for the subkey guess. The right value of guessed 96-bit subkey is 
likely to be among the candidates with the statistic s lower than or equal to the 
threshold r = jf^Qi-a 0 + 577 ■ All cipher keys it is compatible with are tested 
exhaustively against a maximum of 2 plaintext-ciphertext pairs. 


Complexity Estimation. We start by evaluating the complexity T\ of Steps 
4.1-4.17. Ti = iV -2 20 - 2+iV- 2 32 - 15- 2+ A7 -2 40 - 15- 2-+-2 60 - 2 44 - 2- 15 + 2 56 - 2 48 -2-15-(- 
252 . 2 52 . 2 - 15 + 2 48 . 2 56 - 2 - 15-F2 44 - 2 60 - 2 - 15-F2 40 - 2 64 - 2 - 15-F2 36 - 2 68 - 2 - 15 -|-2 36 - 2 72 - 
2-15 + 2 32 -2 76 -2-15 + 2 28 -2 80 -2-15-|-2 24 -2 84 -2- 15 + 2 20 -2 88 -2-15-|-2 :16 *2 92 -2* 15 + 
2 12 -2 96 -2-15 = W2 20 -2 + jV-2 32 -15-2 + jV-2 40 -15-2 + 7-2 104 -2-15 + 7-2 108 -2-15. 

To compute N, we will use Equation d5j). Here, the number of linear approx- 
imations is A = 2 7 81 and n = 64. Therefore N will be computed after choosing 
the values of ao and a±. Different choices of these values will provide a data-time 
trade-off. We start by choosing some concrete values for ao and aq that lead to 
an optimized time complexity. 

Consider for example ao = 2 -2 7 and a% = 2 -8 5 . Then qi- aa « 1-02 and 
Qi-ai ~ 2.77. By replacing these values to Equation we obtain N « 2 62 95 . 
The threshold value gets r = j^qi-ao + 2- 55 02 . Thus T x « 2 115 - 81 times 

of 1/8 encryption, which is equivalent to 2 108 05 times of 27-round encryption. 
The complexity of computing the counters S and S' is negligible. The complexity 
of the last step is 2 128 -2 -8 5 = 2 119 ' 5 times of 27-round encryption. Thus the total 
time complexity of the attack is about 2 119 5 27-round TWINE-128 encryptions. 
The data complexity is N w 2 62 - 95 known plaintexts per key and the memory 
requirements are 2 61 bytes to store the counters. 
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Vectors IT, 6 < r < 22: input mask value of S-box. 

Green nibbles: nibbles with nonzero difference in the subkeys. 
Blue nibbles: nibbles w/nonzero mask 
Yellow nibbles: nibbles w/undetermined mask. 

White nibbles: nibbles w/zero mask or 0 subkey difference. 


Fig. 5. 17-round linear approximations for key difference invariant bias for TWINE-128 
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log 2 (Time Complexity) 


log 2 (Time Complexity) 


Fig. 6. Data-time trade-off for the at- Fig. 7. Data-time trade-off for the at- 
tack on 24-round LBlock tack on 27-round TWINE-128 


In the same way, if we want to optimize the data complexity, we choose 
ao = 2 -2 - 7 and a.\ = 2~ 4 5 . Then qi- ao « 1.02 and gi- ai = 1.70. Equation ([S]) 
gives now N = 2 62 ' 29 and the threshold is 2 -54 38 . The time complexity of the 
attack is 2 123 " 5 and the data complexity is N = 2 62 29 known plaintexts per key. 
Figure [7] depicts different possible data-time trade-offs with /3 0 = 2 -2 ' 7 . 


7 Conclusions 

In this paper, we reveal the fundamental property of key difference invariant 
bias in key-alternating block ciphers. We show how to identify this property effi- 
ciently. We propose a statistical distinguisher for the property and demonstrate 
the property for 5 rounds of AES. As an illustration, using our novel crypt- 
analytic technique, under related keys, we attack more rounds of LBlock and 
TWINE than the best previous cryptanalysis. 
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Abstract. ALE is a new authenticated encryption algorithm published 
at FSE 2013. The authentication component of ALE is based on the 
strong Pelican MAC, and the authentication security of ALE is claimed 
to be 128-bit. In this paper, we propose the leaked-state-forgery attack 
(LSFA) against ALE by exploiting the state information leaked from the 
encryption of ALE. The LSFA is a new type of differential cryptanalysis 
in which part of the state information is known and exploited to improve 
the differential probability. Our attack shows that the authentication se- 
curity of ALE is only 97-bit. And the results may be further improved 
to around 93-bit if the whitening key layer is removed. We implemented 
our attacks against a small version of ALE (using 64-bit block size in- 
stead of 128-bit block size). The experimental results match well with 
the theoretical results. 

Keywords: authenticated encryption, forgery attack, ALE. 

1 Introduction 

Confidentiality and message authentication are two fundamental goals in cryp- 
tography. In symmetric key cryptography, a block cipher /stream cipher is used 
to protect the confidentiality of messages; and a message authentication code 
(MAC) is used to authenticate messages. In the widely used Transport Layer 
Security (TLS), the MAC-then-Encrypt approach is used: HMAC [57] is applied 
to authenticate the TCP packets, and AES [5] in CBC mode [5S] can be used to 
encrypt the payload of TCP packets. 

In many applications, both confidentiality and message authentication are 
required. The authenticated encryption algorithm can achieve encryption and 
authentication simultaneously, and its performance is much better than the 
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combination of separate encryption and authentication. Authenticated encryp- 
tion has received considerable research interests in recent years. A number of 
block cipher based authenticated encryption modes have been proposed, e.g., 

iapm [21], ocb nu, ccm hu, cwc HE gcm HU, eax g, hbs hu, 

BTM PI] and McOE 151. The ISO/IEC 19772:2009 [17 standardized several 
modes, including EAX, CCM, GCM and OCB 2.0. Besides the authenticated 
encryption modes, several authenticated encryption algorithms have been pro- 
posed, such as Helix [14], Phelix [30], Hummingbird-2 [13], ASC-1 [20], the 3GPP 
algorithm 128-EIA3 H] and Grain-128a [3]- The coming competition CAESAR 
(Competition for Authenticated Encryption: Security, Applicability and Robust- 
ness) [7] is expected to attract many new authenticated encryption algorithms. 

ALE. ALE (Authenticated Lightweight Encryption) is an AES-based authen- 
ticated encryption algorithm proposed by Bogdanov et al. at FSE 2013 [6]- It 
is designed for the low-cost embedded systems (such as RFID tags and smart 
cards) and provides single-pass authenticated encryption with associated data. 
The keystream generation of ALE uses the idea of the LEX stream cipher [5], 
and the tag generation uses the idea of Pelican MAC [TU] • It has 256-bit internal 
state and aims to have a probability of success at most 2 -128 for a forgery attack. 

Pelican MAC is an extremely simple MAC based on AES. In Pelican MAC, 
any difference being introduced in the forgery attack passes through at least 
four AES rounds. It ensures that the success rate of a forgery attack is at most 
2 -128 . -phe state size of Pelican MAC is only 128 bits. The small state size means 
that the number of messages being authenticated under the same key should be 
less than 2 64 . Yuan et al. delivered a state recovery attack against the Pelican 
MAC by exploiting the state collision when more than 2 64 authentication tags 
are generated from the same key [33] . The attack given in [33] cannot be applied 
to ALE. In ALE, the state size is increased to 256 bits, and a new nonce is 
needed for generating each authentication tag when the same key is used. 

The stream cipher LEX is based on AES, and four keystream bytes are ex- 
tracted from the AES state after each round. LEX suffers from two attacks. 
The slide attack against LEX recovers the key with negligible complexity when 
around 2 60 nonces are used with the same secret key m- Another attack recov- 
ers the key with around 2 100 simple operations and 2 40 keystream bytes 1 1 1 II 1 2j . 
ALE is not vulnerable to these two attacks due to its large state and the changing 
AES round keys (the round keys in LEX are fixed for the same key). 

The design of ALE is similar to the authenticated encryption algorithm ASC- 
1. In ASC-1, a leaked byte is protected by an additional key byte before it 
is extracted as keystream byte. However, the additional key byte is not used 
in ALE for better hardware efficiency. Unfortunately, the lacking of additional 
key bytes in ALE allows part of the AES state being leaked as keystream, and 
such leaked state information can be exploited to improve the forgery attack, as 
demonstrated in this paper. 

In this paper, we propose a new attack - leaked-state-forgery attack (LSFA) 
against ALE. The general idea of this attack is to exploit the leaked state in- 
formation so as to increase the differential probability. For ALE, there exists 
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four-round AES differential characteristics with probability much larger than 
2 -128 after taking into account the leaked state information. The forgery attack 
against ALE can reach the success rate of 2 -97 , which is 2 31 higher than the 
claimed probability. We show that the results may be further improved if the 
whitening key layer is removed. We implemented our attack on a small version of 
ALE, in which 64-bit block and 4-bit-to-4-bit S-box are used. The experimental 
results match well with the theoretical results. 

Very recently, Khovratovich and Rechberger independently proposed an at- 
tack against ALE in SAC 2013 [22] which also exploits the weakness of the ALE 
scheme. However, we notice that their attack is applied to a variant of ALE 
which the four bytes are leaked after SubByte. And in this work, we optimized 
the differential characteristics used in our attacks so that lower complexities can 
be obtained in this paper. 

This paper is organized as follows. The specification of ALE is given in Sect. 
2. Section 3 describes a basic forgery attack against ALE. Section 4 optimizes 
the forgery attack. Section 5 discusses the effect of removing the whitening key 
layer of four-round AES. Section 6 gives the experimental results on ALE with 
reduced block size. Section 7 concludes this paper. 

2 The Specification of ALE 

In this section, we give a brief description of the ALE. The full specifications of 
ALE can be found in the original paper [5], 

AES round function. AES-128 is used as an underlying primitive of ALE. A full 
specification of AES can be found in [9] . There are four operations in an AES round: 
SubBytes (SB) , ShiftRows (SR) , MixColumns (MC) and AddRoundKey (ARK) . 

AESRound(State , ExpandedKey [*] ) 

{ 

SubBytes (State) ; 

ShiftRows (State) ; 

MixColumns (State) ; 

AddRoundKey (State .ExpandedKey ['<] ) ; 

} 



Fig. 1. The positions of the leaked bytes in the 


and odd rounds of LEX 
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LEX keystream extraction. In the stream cipher LEX, AES round functions are 
repeatedly applied to a state (the subkeys are fixed). At the end of each AES 
round, 4 bytes from the state are extracted as the keystream [5]- The positions 
of leaked bytes are shown in Fig. [TJ 

Pelican MAC. In the Pelican MAC, each 128-bit message block is xored to 
a secret 128-bit state, then the state passes through 4 AES rounds. In Pelican 
MAC, each difference passes through at least 25 active S-boxes (following directly 
from the analysis of AES), thus Pelican MAC provides strong security against 
forgery attack. 

Specification of ALE. The encryption/authentication of ALE is shown in Fig. [51 
The process of associated data and last partial block are omitted here. The en- 
cryption component of ALE is based on LEX, and its authentication component 
is based on Pelican MAC. A different nonce is used in ALE for the protection 
of every message. When the verification fails, the plaintext from the decryption 
should be kept secret so as to prevent state recovery attack. To encrypt/authen- 



Fig. 2. Encryption and authentication of ALE 


ticate a message, ALE takes a 128-bit master key k, a message /i, associated 
data a and 128-bit non-zero nonce v as inputs. And it outputs ciphertext 7 of 
the same length as message and a 128-bit tag r. The initialization of ALE is 
given as follows: the nonce v is encrypted using AES-128 under the master key 
k. The 128-bit output is used as the initial key state. A message with value 0 
is encrypted using AES-128 under the master key k to give the data state. The 
128-bit output AES k ( 0) is encrypted again using the initial key state as the key. 
The key state is updated by applying round key schedule of AES-128 to the final 
round key of last AES encryption with round constant a : 10 in F 2 s. 

To process a 16-byte message block, the data state is encrypted with 4 rounds 
of AES using the key state as key. 16 bytes are leaked from the data state in the 
4 AES rounds in accordance with the LEX keystream extraction. According to 
the code provided by the authors of ALE, five round keys are used during the 4 
AES rounds, namely, an initial whitening key is used. And at each AES round, 
four bytes are leaked after the AddRoundKeyO function. The leak is xored to 
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the current 16-byte block M for encryption. The final round subkey is updated 
one more time using the AES round key schedule with byte round constant x A 
in F 2 s to get the key state. The current message block M is xored to the data 
state so that it would pass through the next 4 AES rounds for authentication 
purpose (similar to that in Pelican MAC). 

The decryption/ verification is similar to the encryption/authentication, ex- 
cept that the ciphertext block is xored to the keystream to get the message, as 
shown in Fig. [3l We provide this figure here since the decryption/ verification is 
important in our attack. 



Fig. 3. Decryption and verification of ALE 


The designers of ALE claim that any forgery attack not involving key recov- 
ery/internal state recovery has a success probability at most 2 -128 . It is stated 
that each secret key is used to protect at most 2 48 message bits. Such restriction 
on message bits does not affect the success rate of our forgery attack. 

3 A Basic Leaked-State Forgery Attack on ALE 

In this section, we present a basic forgery attack against ALE. The chance of 
successful forgery attack is 2 -106 , which is 2 22 larger than the claimed success 
rate 2 -128 . This attack requires 2 41 known plaintext blocks. 


3.1 The Main Idea of the Attack 

The following property of active S-box will be used in our attack: 

Property 1. For an active S-box, if the values of an input and the input/output 
difference are known, the output/input difference is known with probability 1. 

Here the active S-box is the S-box with non-zero input difference. In the rest 
of the paper, we will use a new term active leaked byte to denote a leaked byte 
with difference on it. 
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In the security analysis of Pelican MAC [TU] and ALE [B] , the probability of 
four-round differential characteristic of ALE follows the analysis of AES. It has 
been shown that for any four-round AES differential characteristic, the number 
of active S-boxes is at least 25 [5]. For each S-box, the differential probability 
is at most 2 -6 . Hence, there is a trivial upper bound for the four- round AES 
differential probability which is 2 -150 . However, different from the Pelican MAC, 
4 state bytes are leaked at the end of every round in ALE. Using Property [2 
it is possible to bypass some active S-boxes with probability 1 when the input 
bytes to those active S-boxes are leaked. It means that the overall differential 
probability could be significantly increased. 


3.2 Finding a Differential Characteristic 

The first step of the attack is to find a valid four-round AES differential charac- 
teristic which passes through 25 (or close to 25) active S-boxes and the differences 
pass through several leaked bytes in the first three rounds. 

There are many differential characteristics for four AES rounds. To categorize 
those differential characteristics, we use the number of active bytes before the S- 
box layer in each round to represent a certain type of differential characteristics. 
For example, the differential characteristic shown in Fig. [4] falls in the type “1— 
4-16-4”. Note that the positions of active bytes are not unique for each type. 



Fig. 4. An example of 1-4-16-4 differential characteristic. Gray squares denote leaked 
bytes. Squares marked with broken line denote active bytes. 

In our basic attack, we use the type of differential characteristic shown in 
Fig- HI There are 25 active S-boxes in the differential characteristic, and 8 active 
leaked bytes are located in the first three rounds. 

Next we need to find a differential characteristic with high probability. Note 
that it is not always guaranteed that the differential probability of each active 
S-box can reach the maximum value 2 -6 . The AES S-box has a property that for 
any input difference 5\ and output difference £ 2 , the probability that equation 
S(x)®S{x@ 6 \) =62 has a solution is 127/256. Among the 127 solutions, there 
are 126 solutions have probability 2 -7 and only one solution has probability 
2 -6 . Hence, for an active S-box, there is a unique output difference reaches the 
probability 2 ~ 6 for difference propagation. It shows the conditions to set active 
S-boxes with difference propagation probability 2 -6 will limit the number of 
choices for the possible differential characteristics. 

It is thus not surprising that we found no differential characteristic such that 
every active S-box (except those involving the leaked ones) has the maximum 
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Fig. 5. A differential characteristic of type “1-4-16-4”. The hexadecimal numbers 
indicate the difference values. The empty squares indicate no difference. The squares 
of leaked bytes are marked with gray color. 

differential probability 2 -6 after testing all the possible positions of the type 
“1-4-16-4” . In order to find a differential characteristic, we need to allow some 
active S-box with differential probability 2 -7 . We managed to find a number 
of differential characteristics. One of them is given in Fig. [5l and we will use 
this differential characteristic to demonstrate our basic attack. The differential 
probability of this differential characteristic is given as 2 _6xl6+ ( _7 ) x9 = 2“ 159 
(differential probability 2 -6 for 16 active S-boxes, 2 -7 for 9 active S-boxes). 

Three differences in Fig. [5] will be used in our attack: the input difference An, 
the output difference Aut and the keystream difference A s : 

An = (0,0, 0,0; 0,0, 0,0; 0,0, 0,0; 0,96,0,0); 

Aut = (B1,DE,6F,6F; 0,0, 0,0; B8,5C,82,55; 0,0, 0,0); 

A = (0,0,E,F3; 59,37,6E,F2; 0,81,6C,0; 0,0, 0,0); 

Note that the values in A are obtained by simply concatenating the bytes 
extracted from the states. The order of those bytes has no effect on the attack, 
as long as this order is fixed. 


3.3 Launching the Forgery Attack 

After finding a four-round AES differential characteristic, we need to determine 
the values of the leaked bytes on the differential characteristic so as to improve 
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the differential probability. The values of the leaked bytes are important for 
locating the ciphertext bytes that will be modified in the forgery attack. 

In the differential characteristic shown in Fig. [3 the differences at the positions 
of leaked bytes are known before and after the S-box. Hence, we solve for the 
values of the active leaked bytes. There are either two or four possible solutions 
depending on the output difference. We store the possible values of leaked bytes 
in a table T (Table [5] in Appendix [AJ . Notice that we ignore the conditions on 
the leaked bytes in the fourth round because that for any leaked values at the 
end of Round 3, we can always derive the corresponding difference in Round 4. 

If the value of a keystream block Sj falls into one of the possible values of 
table T, we modify the previous ciphertext block Cj_i and the current ciphertext 
block Ci using the differences given in Fig. 0 More specifically, c'_j = Cj_i ffi Aj n ; 
c( = Cj ® A out ® A s . The modified ciphertext is sent for decryption/verification. 

We illustrate here how the above attack works. From the decryption, the dif- 
ference Arrii-i = (cj_ i ® Sj_i) ffi (cf_ x ® s' i _ 1 ) = A in because Asi-,% = 0; the 
difference Ami = (c* © Sj) ffi (c£ ffi s() = A out because c( ffi Cj = A out ffi A s . Then 
Ami- i is introduced to the data state, and after four rounds, Am* is introduced 
to cancel the difference in the state. The difference propagation follows that 
in Fig. E 

Complexity of the Attack. In the attack above, the differential probability 
of the differential characteristic is 2“ 159 before considering the leaked bytes. 
There are eight leaked bytes being involved in the differential characteristic, 
with 5 of them being introduced to the active S-boxes with probability 2 -7 , 
and another 3 of them being introduced to the active S-boxes with probability 
2 -6 . According to Property [H the differential probabilities of those eight active 
boxes involving the leaked bytes become 1. The overall differential probability 
becomes 2 -159 x 2 7xS x 2 6x3 = 2 -106 . The success rate of the above attack is 
thus 2 -106 . 

In this attack, eight leaked keystream bytes are considered, and the values of 6 
leaked bytes (from the first two rounds) should be one of the 128 entries in Table 
T (as explained above). A random keystream block satisfies the requirement 
with probability 128/2 6xS = 2 -41 . We thus need 2 41 known plaintext blocks in 
this attack. 

4 Optimizing the Leaked-State-Forgery Attack against 
ALE 

In this section, we optimize the LSFA against ALE. In Sect. 14.11 we improve the 
success rate of the forgery attack. The optimal success rate of a forgery attack 
can reach 2 -97 , while 2 56 known plaintext blocks are needed. In Sect. 14.21 the 
number of known plaintext blocks can be reduced to 2 s - 4 for achieving a success 
rate 2 -102 . Note that the known plaintext blocks can be related to different keys 
or different nonces. 
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4.1 Improving the Differential Probability 

From the attack presented in Sect. [3l we notice that the success rate of forgery 
attack is determined by the probability of the differential characteristic after 
taking into account of the leaked bytes. To evaluate the success rate of the forgery 
attack against ALE, we use the term effective active S-boxes to represent the 
active S-boxes which cannot be bypassed by exploiting the leaked bytes. In the 
following, we will analyze different cases to find the smallest number of effective 
active S-boxes. 

We start with recalling some properties of the AES round function. The func- 
tion MixColumns has a property that if it is active, the total number of active 
bytes in the input and output will be at least five (the property of the maximum 
distance separable code). By referring to the Lemma 9.4.1 from [9], we have the 
following lemma. 

Lemma 1. The number of active S-boxes of any two-round AES differential 
characteristic is lower bounded by 51V, where N is the number of active columns 
in the first round. 

In the four AES rounds in ALE, there are 16 leaked bytes. But the leaked 
bytes from the fourth round cannot be exploited in the attack as they do not 
pass through S-boxes directly. Therefore only the leaked bytes in the first three 
rounds can be exploited, and there are at most 12 active leaked bytes. We use 
[hMM] to indicate the number of active leaked bytes in the first three rounds 
respectively. For instance, the number of active leaked bytes in the differential 
characteristic in Fig. 0] is [2,4,2], And we use nf ( i = 1,2, 3, 4) to denote the 
number of active S-boxes at each S-box layer, which will be used in later analysis. 

In the following, we will analyze differential characteristics with the smallest 
number of effective active S-boxes, using the techniques of solving Mixed- Integer 
Linear Programming (MILP) problems [9S1I32]. MILP is a useful technique for 
proving security bounds against differential cryptanalysis, by evaluating the min- 
imum number of active S-boxes in several rounds of encryption. Designers and 
cryptanalysts only require to write out simple (in)equations that are input into 
an MILP solver, then an optimal solution will be returned. 

We denote by X, the input state of round i, then we have X i+ \ = ARK o 
MC 0 SR 0 SB(Xi), where i G {1,2, 3, 4}. Let X itj be the j-th byte of X u 
where 0 < j < 15. For a further step, suppose Y) = SB(Xi), Z t = SRiYf) and 
Wj = MC(Zi). We introduce a function x to catch whether a byte is nonzero, 
that is , x( x ) = 1 if x 7 ^ 0 and x( x ) = 0 if x = 0. Here, the value of x( x ) is a 
real number. Then, according to the techniques given in (25]|32] , the problem of 
evaluating the minimum number of effective active S-boxes is translated to an 
MILP problem as follows. 

The Objective Function. The objective function is to minimize the value of 

£ {x{AX 2 , k ) + x{AX^ k ))- Y. x(4*3,i), (!) 
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since we would like to evaluate the minimum number of effective active S-boxes. 
In (H|), the number of effective active S-boxes is obtained by first counting the 
number of active S-boxes in four consecutive rounds of AES and then minus the 
number of active leaked bytes. 


Constraints. According to the property of MixColumns , we have Yjj= Ak ) 

+ x(AWi j )) = 0 or > 5, where 1 < i < 4 and 0 < k < 3. On the other hand, 
we ha vex(4y w ) = xAAj) = xAY^j mod 16 ) and x (AY i+1)i ) = 

x(AWi,j) (0 < j < 15). Thus, two consecutive rounds of AES provide us four 
constraints: 


5ii, r < mod ie) + x{AX i+ltj )) < 8 d iA , (2) 

j= o 

7 

5di,2 < Z( X (AXi, 5 j mod 16 ) + x(dX i+1|i )) < 8d i>2 , (3) 

3 = 4 
11 

5di,3 < YjjdAXw mod 16 ) + X(^+l,i)) < 8di,3, (4) 

5 d iA < Y, ix(AXifij mod 16) + x(AX i+lij )) < 8 d iA , (5) 

j = 12 

where * e {1,2,3} and dtj 6 {0,1} (1 < j < 4). Notice that dij = 0 if and 
only if all eight differences before and after MixColumns are zero and dij = 1 
otherwise. Here, we do not consider the case of i = 4 since linear transformations 
in Round 4 does not influence the probability of a differential characteristic. 


Additional Constraints. To avoid trivial solution where the minimum number 
of active S-boxes is zero, the following constraint 

15 

£x(A*u)>1 (6) 

l=o 

is added to ensure that at least one S-box is active. For a further step, the 
constraint 

E <x(A* a , fc ) + x(AX Atk )) + E x( AX 3,i) = n (or < n) (7) 

fc=0,2,8,10 (=4,6,12,14 

is added to the system. That is, all differential characteristics are classified by the 
number of active leaked bytes. Constraint (|IJ) help us quickly locate the pattern 
of differential characteristics with minimum effective active S-boxes. 

Since a four-round differential characteristic has at least 25 active S-boxes, 
the number of effective active S-boxes is at least 25 — n if n active leaked bytes 
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are involved. Experimental results confirm this but bring us more knowledge. We 
solve 11 MILP problems by setting n to be different values, that is, n < 2,3,. ..,8 
and n = 9,10,11,12. Here, we choose Maple software [l] to solve them. The 
minimum number of effective active S-boxes, denoted by to, classified by the 
number of active leaked bytes is given in Table [TJ Each MILP problem cost few 
seconds to return the optimal solution by running the code in Appendix iBl 


Table 1. Minimum number m of effective active S-boxes, if (<)n active leaked bytes 
are included in a differential characteristic 


From Table [TJ we conclude that the best probability of a differential charac- 
teristic is at most 2 -96 , since a differential characteristic has at least 16 effective 
active S-boxes. What is more, exactly 9 or 10 active leaked bytes are involved 
if a differential characteristic has 16 effective active S-boxes. An interesting ob- 
servation is that the minimum number of active S-boxes (i.e., n + to) may be 
greater than 25 if too many active leaked bytes are included in a differential 
characteristic, because it has to cover too many specific positions in these cases. 

Now, we demonstrate that only 4 kinds of differential characteristics may 
have exactly 16 effective active S-boxes by analyzing the distribution of 9 or 10 
active leaked bytes in a four-round differential characteristic. This is done by 
adding more concrete constraints to the MILP step by step. We choose the case 
h + h + h = 10 to show the way of determining the distribution of the 10 active 
leaked bytes in each round. Similar process is applied to li + fo + h = 9- The 
procedure is summarized in Table [5J 

Since h + h + h = 10, we have 1-2 = 2,3 or 4. The minimum number of 
effective active S-boxes is 17, 20 and 16 if I 2 = 2, 3 and 4, respectively. Thus, to 
find differential characteristics with exactly 16 effective active S-boxes, we only 
need to consider 1 2 = 4, which implies h + 13 = 6. For a further step, we have 
li =2,3 or 4. The minimum number of effective active S-boxes is 17, 20 and 
16 if [li,h] = [2,4], [Zi , Z2] = [3,4] and [Zi , Z2] = [4,4], respectively. Therefore, 
differential characteristics with exactly 10 active leaked bytes and 16 effective 
active S-boxes exist only if [Zi , I 2 , Z3] = [4, 4, 2]. Combined with Lemma [TJ i\ =4 
implies n\ > 2 and > 10 since at least two columns are active in the 

first MixColumns layer; [Zi , Z2] = [4,4] implies + nf > 20; [^2,^3] = [4,2] 
implies +n£ > 15 and nf > 4, where nf> 4 since two active leaked bytes 
appear in round 4 and at least two active bytes will appear in two non-leaking 
columns. Thus, for case [Zi, I 2 , 1 3 } = [4, 4, 2], only one possible type of differential 
characteristic 2-8-12-4 can be appeared. 

Summary of the Analysis. Prom the above discussion, we conclude that the 
number of effective active S-boxes is at least 16 in a differential characteristic. 
And there are four types of differential characteristics, “2-3-12-8”, “2-8-12-4”, 
“2-8-12-3” and “4-6-9-6” , which can reach this lower bound. 
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Table 2. Minimum number m of effective active S-boxes with more constraints, the 
distribution of 9 or 10 active leaked bytes in these rounds, and the type of possible 
differential characteristic 


n 

additional constraints 

m 

[h,l2,l 3 ] 

Type of differential characteristic 


h = 2 

17 

discard 



l 2 = 3 

20 

discard 


10 

l 2 = 4, h = 2 

17 

discard 



h = 4, h = 3 

20 

discard 



h = 4, h = 4 

16 

[4,4,2] 

2-8-12-4 


h = 1 

16 

[4,L4] 

4- 6- 9-6 


h = 2 

17 

discard 



h = 3 

21 

discard 


9 

l 2 = 4, h = 1 

16 

[L4,4] 

2-3-12-8 


l 2 = 4, h = 2 

17 

discard 



I2 = 4, li = 3 

21 

discard 



l 2 = 4, h = 4 

16 

[4,4,1] 

2-8-12-3 


After testing these four types of differential characteristics, we conclude that 
there is no differential characteristic in which each of the effective active S-box 
reaches the maximum differential probability 2 -6 . The differential characteris- 
tic with best probability is of the type “2-8-12-4”, and the details are given 
in Fig. [SJ In this differential characteristic, the probability of one effective ac- 
tive S-box is 2 -7 . So the overall probability of the differential characteristic is 
2 _ 6 xi 5 +(— 7 ) _ 2 -97 . This is the best success rate of the forgery attack against 
ALE. For this differential characteristic, the values of 8 leaked bytes (from the 
first two rounds) should be one of the 2 8 values given in Table [5] in Appendix [Al 
And the probability of random keystream block satisfying the requirement is 
2 8 /2 8x8 = 2 -56 . If each key is restricted to protect 2 48 message bits (2 41 mes- 
sage blocks), we need to observe 2 15 keys to find a weak keystream block to 
launch the attack. The experimental results of this attack on a small version of 
ALE are given in Sect. 16.11 


4.2 Reducing the Number of Known Plaintext Blocks 

There are two approaches to reduce the number of known plaintext blocks re- 
quired in the attack. One approach is to allow differential probability of 2 -7 
for some effective active S-boxes; another approach is to reduce the number of 
active leaked bytes in a differential characteristic. In these two approaches, with 
the reduced success rate, we are able to reduce the number of known plaintext 
blocks drastically. 


Relaxing Conditions on Effective Active S-boxes. When we try to find 
the best probability for the differential characteristics, it is important to restrict 
as many as effective active S-boxes to have probability 2 -6 for the input and 
output differences. However, if we are not satisfied with the large number of 
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Fig. 6. Differential Path of type “2-8-12-4” . The hexadecimal numbers indicate the 
difference values. The empty squares indicate no difference. The squares of leaked bytes 
are marked with gray color. 


plaintext blocks required to launch the attack, we can relax the condition on 
some active S-boxes to have probability 2 -7 . For instance, the probability of 
random keystream satisfying the requirements for leaked bytes in the differential 
characteristic presented in Sect . 14. II is 2 -56 . However, if we relax the probabilities 
on two effective active S-boxes to 2 -7 , this probability increases to at least than 
2“ 50 because the increased number of differential characteristics is at least 2 6 
by our test. It can be increased further if more conditions on effective active 
S-boxes are relaxed. 

Reducing the Number of Active Leaked Bytes in the First Two Rounds. 

Another way to reduce the number of known plaintext blocks is to reduce the 
active leaked bytes in the first two rounds. The reason is that only the active 
leaked bytes in first two rounds are related to the conditions on leaked bytes. 
No matter what values the active leaked bytes are taken in Round 3, we can 
determine the corresponding differences after the S-box layer according to the 
leaked values. The only cost is an additional pre-computed look-up table. One 
good choice is let the number of active leaked bytes to be [4,0,4], and the 
type of differential characteristic is “6-4-6-9” . In this case, we only need to check 
conditions on the four active leaked bytes in the first round, yet we can still have 
a relatively good differential probability. There are 762408 possible differential 
characteristics in the first two rounds when all the 17 effective active S-boxes are 
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with probability 2 -6 , resulting in a success rate 2 -102 for the forgery attack. The 
average number of solutions for an active S-box is estimated as 2 x 126/127 + 
4 X 1/127 = 2 101 . Therefore, the probability for a random keystream satisfying 
the conditions on leaked bytes is 2 1 01x4 x 76 2 4 08/2 32 = 2 - 8 - 4 . The details of 
one of the 762408 differential characteristics are provided in Fig. [7] 



Fig. 7. Differential Path of type “6-4-6-9” . The hexadecimal numbers indicate the 
difference values. The empty squares indicate no difference. The squares of leaked 
bytes are marked with gray color. 


5 Effect of Removing the Whitening Key Layer 

In this section, we show that the results may be further improved if the whitening 
key layer is removed. The success rate of a forgery attack can reach around 2 -931 , 
and only one or two plaintext blocks are needed to launch the attack. 

Once the whitening key layer is removed, additional four bytes before the 
first S-box layer are known to an attacker, i.e., byte X-^ 4 , Xj.e, ^ 1.1 2 and Xi i]4 . 
They are obtained by xoring the previous message block and the last four leaked 
bytes of processing the previous message block. Thus, at most 16 leaked bytes 
can be exploited. In the following discussions, we denote by lo the number of 
active leaked byte before the first S-box layer, while li, I 2 and 1 3 still indicate 
the number of active leaked bytes in the first three rounds respectively. 
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First, we analyze the smallest number of effective active S-boxes in a differ- 
ential characteristic. The objective function is adjusted to minimize the value of 

tt Y, (x(AX hk )+x{AX 3tk )) - (x(AX 2 ,i) + X (AX^)), 

( 8 ) 

since now additional four bytes are leaked before the first S-box layer. Similarly, 
0 is adjusted to the following constraint 

E (x(^i,*) + x(Ay 8 ,*))+ E (x(AX2,l) + x(AY 4 ,t)) = n. (9) 

Notice that n = Iq + l\ + + Is- 


Table 3. Minimum number m of effective active S-boxes, if n active leaked bytes are 
included in a differential characteristic 

|n|0|l|2|3|4|5|6|7|8|9 1 10 1 11 1 12 1 13 1 14| 15 1 16 1 

| m 1 30 1 24 1 23 1 22 1 21 1 20 1 19 1 18 1 17 1 16 1 15 1 19 1 18 1 22 1 21 1 25 1 24 1 


The minimum number of effective active S-boxes classified by the number of 
active leaked bytes is given in Table [3l We conclude that a differential charac- 
teristic involves at least 15 effective active S-boxes. Thus, the best probability 
of a differential characteristic is at most 2 -90 . For a further step, exactly 10 
active leaked bytes are included in a differential characteristic with 15 effective 
active S-boxes, that is, lo + h+ h + h = 10. Similar to the process of Table 01 
the distribution of the 10 active leaked bytes in these four rounds is studied by 
adding more and more constraints to the MILP problems. This is done by first 
studying the sum of l\ + I 2 , which may be 2, . . . , 7 or 8, and then investigating 
the values of U (0 < i < 3). The results are given in Table 01 

From Table 01 we conclude that a differential characteristic with 15 effective 
active S-boxes exists only if the concrete distribution of the 10 active leaked 
bytes satisfies 

1) [lo,h,h,h] = [4, 0,2, 4], [4, 2, 0,4], [2, 0,4, 4], [4, 0,4, 2], [2, 4, 0,4] or [4, 4, 0,2], 
and 

2) x(AY M ) = x (AY iil4 ) if rii = 2 and i £ {1,3}; x(ATi,o) = if 

rii = 2 and i £ {2, 4}. 

Then, we analyze all the 12 cases of differential characteristics with 15 effective 
active S-boxes. For each of the 12 cases listed in Table 01 different types of 
differential characteristics may satisfy it. In this situation, we maximize the 
number of effective active S-boxes in Round 1 and Round 4, as the differential 
probability of effective active S-boxes in these two rounds can always reach the 
maximum value 2 -6 once the differential characteristic is constructed using the 
start-from-the-middle technique, which is also employed by the authors in [22] . 
The best differential characteristics we found are given as follows. 
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Table 4. Minimum number m of effective S-boxes with more constraints, and the 
distribution of 10 active leaked bytes in these rounds 


l\ + 12 

additional constraints 

m 

[Zo, Zi, l 2 , 13] 

Case number 


h = 0, x(AXs,4) + x (AY 3 ,u) = 0 

15 

[4, 0,2, 4] 

#1 


ii=0, x(AA 3j4 ) + x(AA 3> i 4 ) = 1 

20 

discard 



h = 0, x(AXs, 4 ) + x(AY 3i14 ) = 2 

15 

[4, 0,2, 4] 

#2 

2 

h = 1 

20 

discard 



h=2, X (AX 2 , o )+x(AX 2 , 2 )=0 

15 

[4, 2, 0,4] 

#3 


b =2, x(AX 2 , 0 ) + X (AX 2 , 2 ) = 1 

20 

discard 



h = 2, x(AX 2 , 0 ) + x(AX 2 , 2 ) = 2 

15 

[4, 2, 0,4] 

#4 

3 


20 

discard 



h =0,l o = 2, x(AX h4 ) + X (AX 1m ) = 0 

15 

[2,0, 4, 4] 

#5 


h = 0, Z 0 = 2, x(AXi, 4 ) + x(AY lll4 ) = 1 

20 

discard 



h =0,lo = 2, x(AX 1a ) + x(AY 1>14 ) = 2 

15 

[2, 0,4, 4] 

#6 


li = 0, Zo = 3 

20 

discard 



h =0,lo = 4, X {AX Afi ) + x( AY 4 , a ) = 0 

15 

[4, 0,4, 2] 

#7 


h = 0, Zo = 4, x(^^4,o) + x(AX 4 , 2 ) = 1 

20 

discard 



h = 0, Zo = 4, x(^*4,o) + x(AX 4 , 2 ) = 2 

15 

[4, 0,4, 2] 

#8 


h — 1 

20 

discard 


4 

It = 2 

18 

discard 



Zi =3 

20 

discard 



h =- i* Zo = 2, x(^i,4) + x(AY lll4 ) = 0 

15 

[2, 4, 0,4] 

#9 


h * it lo « 2 , x(21Xi, 4) + x(AY lll4 ) a 1 

20 

discard 



lx = 4, Zo = 2, x(AXi, 4) + x(AY lll4 ) = 2 

15 

[2, 4, 0,4] 

#10 


Zi = 4, Zo = 3 

20 

discard 



Zi = 4, Zo = 4, x(AX4,o) + x( AY 4 , a ) = 0 

15 

[4, 4, 0,2] 

#11 


Zi = 4, Zo = 4, x(^^4,o) + X(^A 4 , 2 ) = 1 

20 

discard 



h = 4, Zo = 4 , x(AX4,o) + x(AX 4 , 2 ) = 2 

15 

[4, 4, 0,2] 

#12 

5 


20 

discard 


6 


17 

discard 


7 


20 

discard 


8 


16 

discard 



— For each of the 8 cases with h+l 2 = 4, that is, case #5 to # 12, a differential 
characteristic with probability of about 2 -931 can be construct for almost 
all of the leaked information. Experimental results show that we can not 
obtain a differential characteristic for 412, 443, 402 and 373 out of 2 32 leaked 
information in case #5 and #6, case #7 and #8, case #9 and #10 and case 
#11 and #12, respectively. Thus, in average, two plaintext blocks are enough 
to launch a forgery attack. The differential characteristic of case #10 is given 
in Appendix [0 

- For each of the four cases with Zi + = 2, that is, case #1 to #4, a class 

of 1020 differential characteristics with average probability of 2 -941 always 
can be constructed, whatever the leaked information is. Thus, the forgery 
attack can be launched for any plaintext block. Differential paths of the case 
#4 are given in Appendix [Dj 
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Summary of the Analysis. From the above discussion, the whitening key 
layer is important for ALE. Once it is removed, more internal information will 
be leaked to an attacker, resulting in forgery attacks with higher success rates 
and less required plaintext blocks. The success rate of a forgery attack now is 
about 2 -93 ' 1 to 2 -941 , and at most 2 plaintext blocks are needed. 

6 Experiments on a Reduced Version of ALE 

As a proof of concept, we would apply our attacks to ALE (with the whitening 
key). However, it is impossible to directly attack the original ALE as the com- 
plexity is too high. Instead, we choose to attack a reduced ALE construction 
based on an AES-like light-weight block cipher, LED |16j . 

The LED block cipher has similar round function as AES except that the 
operation AddConstants is used before the S-box layer in each round, and the 
round keys are added every four rounds. The S-box in LED has difference prop- 
agation probability at most 2 -2 . Unlike the AES S-box, the output difference 
may not be unique to attain the best difference propagation probability. And for 
input difference 14, the probability 2 -2 can never be obtained. So we need to 
take care of these differences in the attack. 

In our experiments, we modified the LED round function so that it has the 
same ordered operations: SubCells, ShiftRows, MixColumns, AddRoundKeys as 
AES. Since the differential characteristic is not related to the key schedule, we use 
random round keys rather than deriving them from the key schedule. In addition, 
we simplified the input message to the two-block case without considering the 
initialization, padding and the associated data. The initial state is randomly 
generated. 

6.1 The “2-8-12-4” Differential Characteristic 

In the optimized forgery attacks presented in Sect. 14.11 the differential charac- 
teristic of type “2-8-12-4” is one of those have the highest success rate. We will 
experimentally verify the results on this type of differential characteristics. 

Estimations. Using the above reduced ALE, we searched the differential char- 
acteristics of type “2-8-12-4”. Like the case discussed in original ALE, we need 
to relax the difference propagation probability of one effective active S-box to 
find a valid differential characteristic. Fig. [10] in Appendix [E] illustrates one of 
the differential characteristics we found. 

To estimate the probability that a random keystream block is vulnerable to 
the attack, we analyze the number of solutions for the values of active leaked 
bytes in first two rounds. In each of the first two rounds, there are 2 6 possible 
solutions for the values of the four active leaked bytes. Therefore, the probability 
of a random keystream block satisfies the conditions on leaked bytes is estimated 
as 2 6 x 2 6 X 2( _4 ) x8 = 2 -20 . The average number of plaintext blocks needed to 
get a vulnerable keystream block is thus 1 + l/2 — 20 = 1 + 2 20 . Notice that we 
need an extra plaintext block to introduce the initial differences. 
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There are 16 effective active S-boxes in the chosen differential characteristic: 
15 of the active effect S-boxes with differential probability 2 -2 , and one with 
probability 2 -3 . So the estimated probability of the differential characteristic is 
2<— 2)xi.5+(-3)xi _ 2~ 33 which is also the success rate of the forgery attack. 

Experimental Results. First, we check the probability of the vulnerable 
keystream blocks. After encrypting 2 27 1 random plaintext blocks, we found 2 7 
vulnerable keystram blocks. Hence, the average plaintext blocks needed to find 
a vulnerable keystream block is 2 27 1-7 = 2 20 - 1 which matches the estimated 
value. 

Then, we verify the success rate of the forgery attack. For a vulnerable 
keystream block, the value of final state is xored with the second message block 
and stored as t\. The differences in the final state (thus the leaked bytes) in 
Round 4 are determined by the values of leaked bytes in Round 3. Then we 
compute two forged ciphertext blocks similar to the attack procedure in Sect. [3] 
(but using the difference shown in Fig.ITTIlm Appendix[E]). We decrypt the forged 
ciphertext blocks and xor the second plaintext block from decryption with the 
final state to get fa. If the two internal states fa and fa collide, we get a successful 
forgery. After examining 2 36 36 vulnerable keystream blocks, we managed to get 
10 collisions at the internal states after two blocks. So the average probability 
for one successful forgery is 2 -33 - 04 . One of the successful forgeries is given in 
Appendix [El 


6.2 The “6-4-6-9” Differential Characteristic 

In Sect. 14.21 the differential characteristics of type “6— 4-6-9” (Fig. [7]) are ob- 
served to require a small number of known plaintext blocks yet have good success 
rate. We experimentally tested this case on the reduced version of ALE. 

Estimations. For this type, we found 1400 differential characteristics for the 
first two rounds, resulting in 21311 different values for the leaked bytes in the 
first round. Details of one of the differential characteristics are given in Fig. [Tl] 
in Appendix [Fj It is interesting to notice that certain leaked values may be used 
in more than one differential characteristic. If we take this into consideration, 
there are 28657 different leaked values related to the 1400 differential character- 
istics. Since there are only four active leaked bytes in the first two rounds, the 
probability that a random keystream is vulnerable is 28657/2 4x4 = 2 -1 12 . Thus, 
the estimated number of plaintext blocks needed to find a vulnerable keystream 
block is 1 + 1 /2 — 112 = 2 17 . 

There are 17 effective active S-boxes in the differential characteristic. All of 
them attain the maximum differential probability 2 -2 . So the estimated prob- 
ability of the differential characteristic is 2^ 2 - )x17 = 2 -34 , which is also the 
success rate of the forgery attack. 

Experimental Results. In our experiments, 2 20 7 vulnerable keystream blocks 
are generated from the encryption of 2 216 random 2-block plaintexts. So the 
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average number of blocks needed to find one vulnerable keystream block is 2 x 
2 21 ' 6 /2 20 ' 7 = 2 19 , which is close to the estimated value. 

After querying 2 37 - 7 forged ciphertexts, we found 10 collisions in the inter- 
nal states. So the average probability of successful forgery is around 2 -34 ' 4 
which is close to the estimated 2 -34 . One of the successful forgeries is given in 
Appendix [FJ 

7 Conclusion 

The ALE authenticated encryption algorithm is claimed with a forgery success 
rate of 2 -128 . In this paper, we show that the success rate is significantly higher 
than the claimed rate. By applying the proposed leaked-state-forgery attack, the 
success rate can reach 2 -97 . For a success rate 2 -102 , every one out of 2 8 4 plain- 
text blocks is vulnerable to the forgery attack. We also show that the whitening 
key layer is important for ALE, as the complexity of forgery attack can be im- 
proved with probabilities from 2 -931 to 2 -941 , and at most two plaintext blocks 
are needed if the whitening key layer is removed. Our attacks are well-supported 
by the experimental results on a reduced version of ALE. Our attack confirms 
again that “it is very easy to accidentally combine secure encryption schemes 
with secure MACs and still get insecure authenticated encryption schemes” [23] . 
Hence, in the design of authenticated encryption algorithms, we should be very 
cautious in analyzing the interaction between encryption and authentication. 
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A Values of Leaked-Bytes 

The values of leaked bytes for the differential characteristic used in the basic 

LSFA in Sect. [3] are given in Table 0 The index is the byte position in the 

keystream block. 6i n and 5 out are the input and output differences for the S-box. 

a and /3 can be arbitrary values extracted from the leaked bytes in Round 3. 

From the table, the total number of possible values at the active leaked bytes in 

first two rounds is2x2x2x2x4x2 = 128. 


Table 5. Possible values of leaked bytes in hexadecimal for the basic LSFA. indicates 
no difference. indicates arbitrary values, a and 6 are values from the leaked bytes. 


Index 

S in 

Sout 

Value (s) 

0-1 



* 

2 

E 

42 

11 or IF 

3 

F3 

C6 

F, FC 

4 

59 

FC 

23, 7A 

5 

37 

E5 

19, 2E 

6 

6E 

FC 

0, 6E, 8C, E2 

7 

B2 

E5 

46, F4 

8 



* 

9 

81 

S(a) ® 5(81 ® a) 

a 

10 

6C 

S(/3) ® S(6C ® j3) 

0 

11 - 15 



* 


The values of leaked byes for the differential characteristic used in the op- 
timized LSFA in Sect. 14.21 are given in Table [51 The total number of possible 
values at the active leaked bytes in first two rounds is 2 s . 
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Table 6. Possible values of leaked bytes in hexadecimal for the optimized LSFA in 
Sect. 14.21 indicates no difference. indicates arbitrary values, a and 3 are values 
from the leaked bytes. 


Index 

Sin 

Scut 

Value(s) 

0 

49 

84 

ID or 54 

1 

CE 

97 

33, FD 

2 

87 

35 

44, C3 

3 

92 

13 

5E, CC 

4 

74 

89 

10, 64 

5 

57 

73 

BO, E7 

6 

A6 

23 

6D, CB 

7 

3A 

13 

08, 32 

8-9 



* 

10 

3D 

S(a) ® S(3D ® a) 

a 

11 

EE 

S{p) ® S( EE ® p) 

P 

12 - 15 



* 


B Maple Program for Solving MILP Problems 


We employ the function “LPSolve” included in the “Optimization” package of 
Maple software to solve MILP Problems. To simplify the variables in the MILP 
problems given in Sect. 14.11 we compress an( l <k,j to xij and dij here. 

Then, results in Table [T| are obtained by running the following program. 


with (Optimization); 

%if n<=8, the last constraint x20+x22 + . . . + x48+x410>=n will be removed. 

LPSo’lve (xlO+xU+xl2+xl3+xl4+xl5+xl6+xl7+xl8+xl9+xllO+xlll+xll2+xll3 
+xll4+xll5+x21+x23+x24+x25+x26+x27+x29+x2U+x212+x213+x214 
+x215+x30+x31+x32+x33+x35+x37+x38+x39+x310+x311+x313+x315 
+x41+x43+x44+x45+x46+x47+x49+x411+x412+x413+x414+x415 , 

{ x 1 0+x 15+xllO+xll 5+x2 0+x2 l+x2 2+x2 3 >=5*d 1 1 , 
x 1 0+x 1 5+x 1 1 0+xl 1 5+x2 0+x2 1+x22+x2 3 <=8*dl 1 , 
x 1 4+x 1 9+x 1 1 4+x 1 3+x24+x2 5+x26+x2 7>=5*dl 2 , 
xl4+xl9+xll 4+x 1 3+x 2 4+x 2 5+x 2 6+x2 7 <=8*d 1 2 , 
x 1 8+x 1 1 3+x 1 2+x 1 7+x2 8+x2 9+x2 1 0+x2 1 1 >=5*dl3 , 
x 1 8+x 1 1 3+x 1 2+x 1 7+x2 8+x2 9+x2 1 0+x2 1 1 <=8*d 1 3 , 
xll2+xll+xl6+xlll+x212+x213+x214+x215>=5*dl4 , 
xll2+xll+xl6+xlll+x212+x213+x214+x215<=8*dl4 , 
x20+x25+x210+x215+x30+x31+x32+x33>=5*d21 , 
x2 0+x2 5+x2 1 0+x2 1 5+x3 0+x3 1+x32+x3 3 <=8*d2 1 , 
x 2 4+x 2 9+x 2 1 4+x 23+x34+x35+x3 6+x3 7 >=5*d 2 2 , 
x 2 4+x 2 9+x 2 1 4+x 2 3+x 3 4+x 3 5+x 3 6+x3 7 < =8*d 2 2 , 
x28+x213+x22+x27+x38+x39+x310+x311>=5*d23 , 
x28+x213+x22+x27+x38+x39+x310+x311<=8*d23 , 
x212+x21+x26+x211+x312+x313+x314+x315>=5*d24 , 
x212+x21+x26+x211+x312+x313+x314+x315<=8*d24 , 
x30+x35+x310+x315+x40+x41+x42+x43>=5*d31 , 
x30+x35+x3 1 0+x3 1 5+x4 0+x4 1+x42+x43 <=8*d3 1 , 
x34+x39+x314+x33+x44+x45+x46+x47>=5*d32 , 
x34+x39+x314+x33+x44+x45+x46+x47<=8*d32 , 
x38+x313+x32+x37+x48+x49+x410+x411>=5*d33 , 
x38+x313+x32+x37+x48+x49+x410+x411<=8*d33 , 
x312+x31+x36+x311+x412+x413+x414+x415>=5*d34, 
x312+x31+x36+x311+x412+x413+x414+x415<=8*d34 , 
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xl4+xl6+xll2+xll4+xl0+xll+xl2+xl3+xlll+xll0+xl5+xl7+xl8+xl9+xll3+xll5>=] 
x20+x22+x28+x210+x34+x36+x312+x314+x40+x42+x48+x410<=n, 
x20+x22+x28+x210+x34+x36+x312+x314+x40+x42+x48+x410 >=n 


C Case #10: [l 0 ,l u h,h] = [2, 4, 0,4] with 
x(AX lf d = X(AX 1)14 ) = 1 

The type of a differential characteristic is proposed in Fig. [ 8 ] The distribution 
of active S-boxes in these rounds is 9 — > 6 -A 4 — >• 6 , totally 25 active S-boxes. In 
Fig. [51 from AX \ to AZ±, squares marked with broken line are active, squares 
marked with backslash should be chosen to satisfy some conditions, and empty 
squares have no difference. 

We denote by MC the matrix used in the MixColumns layer. Based on the 
MDS property of matrix MC, once any four out of the eight differences before 
and after the matrix MC are given, then another four differences are uniquely 
determined and can be calculated efficiently. 
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Fig. 8. A differential characteristic with [Zo, h, h, fa] = [2,4, 0,4] and x(/AXi, 4) = 
x(AXi t i4) = 1. Gray squares denote leaked bytes. Squares marked with broken line 
are active, squares marked with backslash should be chosen to satisfy some conditions, 
and empty squares have no difference. 


Now, we specify the differential characteristic following the type of Fig. [51 
From AXi to AZ 4 , bytes without a Greek alphabet have difference zero, and 
the difference of a byte with a Greek alphabet (i.e., a, j3, 7, r /, p, v and a) will be 
determined in the subsequent discussions. Since AX 5 = MC{AZi), we obtain 
the values of Aj (1 < j < 16) once z/s and cr's (.3 < i < 6 ) are determined. The 
procedure of constructing this differential characteristic is given as follows. 
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1 . Construct a differential characteristic from AX 2 to AZ :i . 

1 - 1 . We start at the MixColumns layer of round 2 , and match the differ- 
ences («i, a.2 , . . . , 0.5) first (see the starting point of Fig. [ 5 ]). That is, we 
have to choose nonzero aq, 02, 0:3, a 4 and 05 such that (0:4, 0, 0:5, 0) = 
(04, a.2, Q!3, 0 ) • MC l . This is done by choosing an arbitrary difference 
ai 7^ 0 and computing (ot2, 013, 0:4, 0:5) = ( 4 aq, 7 au, 9 aq, Bap. 

1-2. Compute / 3 i = S^ 1 (ai 05 (X 2 ,o ))0 W 2j o and 72 = S , ~ 1 (o;3 0S , (X2,io))ffi 
^2,10- 

1 - 3 . Choose /?2 such that one of P 3 , . . . ,Pe is zero, where (63, 0 4 , p 3 , pp* = 
MC - 1 ■ {fa, 0 , fa, 0 )*. Thus, p 2 G {D~ 1 E 0 i, B~ 1 9 pi, E~ 1 Dp 4 , 9 ~ 1 BPi}. 
Similarly, choose 71 such that one of 73,.. .,76 is zero. Thus, 71 G 
{E _1 £) 72 , 9 _1 S 72 , D~ 1 E'Y2, -B -1 972 }. 

1 - 4 . Compute r]i = S{X 2t s) 0 S{X 2 ,8 © 7 i) and 772 = S{X 2t 2) 0 S{X 2 ,2 0 Pi)- 
Now, we have to check whether there are nonzero 773,774 and 775 such 
that (774, 0 , 775, 0 ) = (771, 0 , 772, 773) • MC*. It is equivalent to check whether 
Vi = 7r?2 (see the checking point of Fig. [8]). 

1 - 5 . If there is a (04 , P2Z1P such that 771 = 7772, compute (773,774,775) = 
(4772,^772,9772) and go on. Else, return “construction failure” and abort. 

1 - 6 . Choose pi, P2 such that Pr(/r 1 —7 at2)-Pr{ii2 —7 %) 7^ 0 and one of 774, /i 6 

is zero; Choose iq, such that Pr(a 4 —7 1/4) • Pr (775 —7 2/g) 7^ 0 and one 
of 224, is zero; Choose 04 , 02 such that Pr (774 —7 04 ) • Pr{a§ — 7 op) p 0 
and one of (74, 06 is zero. 

2 . Construct the differences of outer rounds. 

2- 1. Compute 7/3 = S' _1 (/X3 0S'(Wi i 4))0Xi i 4 and 7/5 = S ,_1 (/U5 0iS'(Wi i i4))0 

Xi t i 4 . Choose p[ (3 < i < 6) such that Pr{P' i — 7 /%) = 2 -6 if /% 7^ 0 or 
= 0 if pi = 0 ; Choose /z( (i = 4 , 6) such that Pr(/i( - 7 - up = 2 -6 if 

Tti 7^ 0 or = 0 if /Uj = 0 ; Choose 7^ (3 < i < 6) such that Pr( 7! -7 

7i) = 2 -6 if 7i 7^ 0 or q| = 0 if 7* = 0. 

2 - 2 . Compute i / 3 = S(X 4 ,o) © S{X A ,o © ^3), *4 = S(X 4,2) © S(Jf 4 ,2 © wfc), 
0-3 = S'(W 4j8 ) 0 S(X 4 ,8 0 0-3) and 0-5 = 5 (X 4j i 0 ) © 5(^4, 10 0 (t 5 ). Choose 

v[ {i = 4 , 6) such that Pr{v[ -7 vp = 2 -6 if iq 7^ 0 or 1/ = 0 if ?q = 0 ; 

Choose cr- (f = 4 , 6) such that Pr{a' i —7 04) = 2 -6 if 04 7^ 0 or a? = 0 if 

Ui = 0. 

3 . Compute ZIX5 = MC{AZ 4 ). 

Notice that 9 effective active S-boxes in Round 1 and 4 can always reach the 
maximum differential probability 2 -6 . Thus, the probability of this differential 
characteristic is between 2 -7 ' 6-9 ' 6 = 2 -96 and 2 -15 ' 6 = 2 -90 if it exists. The 
existence of this differential characteristic is only related to the existence of a 
differential characteristic in Round 2 and 3 . Two questions Q 1 and Q2 are 
experimentally verified to ensure the existence of a differential characteristic 
from AX'2 to AZ 3 : 

Ql: For each X = (X 2| o, X 2 ,2, X 2 ,s, ^2,10), can we find a triple (ai,/?2)7i) in 
step 1-1 and step 1-3 such that the condition 771 = 7rj2 in step 1-4 is satisfied? 

For each X, it’s very likely to find such a triple, because the choices of 
(04 , P2 , 74 ) are about 2 12 and the probability of 771 = 7rj2 is about 2 -8 . We 
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enumerate all 2 32 values of X and find that the number of “construction failure” 
is 402, that is, there is at least one (ai,^2,7i) such that 774 = 7rj2 for 2 32 — 402 
out of 2 32 X. For each of these X, we may store a candidate of (011,^2,71) in a 
table, which is indexed by the value of X (A redundant triple pair (0, 0, 0) may 
be included for failure cases). The size of this table is 3 x 2 32 bytes. The time 
complexity of this step is at most 2 44 . 

Q2: For any nonzero ((*2,773) (resp. (014, 775) and (774,0(5)), can we find a pair 
of (774, 772) (resp. (7/4, V2) and (04, <72)) which satisfies the conditions given in step 
1 - 6 ? 

Notice that (0-2, 773) has 255 2 choices, 774 and 7x2 have 127 choices once (02. Vs) 
is given. Thus, Q2 can be verified in time complexity of about 2 30 . For a given 
((*2,773), more than one pair of (7x1 , 7x2) may be found to satisfy the condition 
given in step 1-6. In this case, we choose the pair (774, 772) such that Pr(fj,i — > 02)- 
Pr( 772 -A 773) is maximum. Experimental results show that the condition given in 
step 1-6 can be satisfied for each pair of (012, 773), and the maximum probability of 
Pr(77i —l ck 2 ) ■ Pr( 772 -A 773) is 2 -14 , 2 -13 and 2 -12 for 3825, 60690 and 510 pairs 
of (02, 773), respectively. The average probability of Pr(iM — > 0:2)- TV (772 -A 773) is 
2-13.03 Similarly, the condition given in step 1-6 can be satisfied for each pair of 
(04, 775) (resp. (774, <*5)), and the maximum probability of Pr{a.i — > vi) ■ Pr(r ]5 — 7- 
7/2) (resp. Pr(rj 4 -A 04) • Pr(a 5 -A 02)) is 2 -14 , 2 -13 and 2 -12 for 4312, 60203 
and 510 pairs of (0(4,775) (resp. (774,05)), respectively. The average probability 
of Pr( 04 -A ui) ■ Pr(r )5 -A 72 2 ) (resp. Pr(?74 -A oq) • Pr ( 05 -A 02)) is 2 -13 04 . The 
best choices of (774, 772) and (1^4 , ix 2 ) (resp. (oq, 02)) can be stored in two tables. 

Thus, the probability of a four-round differential characteristic proposed in 
this subsection is 2 -6 ' 9-13 - 03-2 ' 13 - 04 ss 2 -93 ' 1 on average. Notice that it always 
exists and can be easily rebuilt by looking up several tables. 

Similar process is done to case #5 to #12 except case #10. Two ques- 
tions similar to Q1 and Q2 are also experimentally verified to check the ex- 
istence of these differential characteristics. To answer question Ql, 2 32 values 
of X = (X 3; 4, X 3;6 , X 3;1 2, A314) are enumerated for case #5 to case #8, and 
2 32 values of X = (A2.0, A2.2, X2.8, N2.10) are enumerated for case #9, #11 and 
#12. The number of “construction failure” is 412 for case #5 and #6, 443 for 
case #7 and #8, 402 for case #9, and 373 for case #11 and #12, respectively. Ex- 
perimental results show that question Q2 always can be satisfied. Therefore, we 
can construct these differential characteristics for almost all cases of the leaked 
X. The probabilities of these 7 differential characteristics are around 2 -931 with 
a small deviation. 

D Case #4: [l 0 , Zi, Z 2 , 13 ] — [4, 2,0,4] with 

X(^X 2 , 0 ) = X(AX 2 , 2 ) = 1 

The type of a differential characteristic is illustrated in Fig. [U] The distribution 
of active S-boxes in these rounds is 9 -7- 6 — ^ 4 — ^ 6, totally 25 active S-boxes. In 
Fig. 02 from AX\ to AZ4, squares marked with broken line are active, squares 
marked with backslash should be chosen to satisfy some conditions, and empty 
squares have no difference. 
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From AXi to AZ&, bytes without a Greek alphabet have difference zero, and 
the difference of a byte with a Greek alphabet (i.e., a, 0 , 7, 77, /i, v and a) will be 
determined in the subsequent discussions. Since AX 5 = MC{AZ&), we obtain 
the value of Aj (1 < j < 16 ) once v{ and cr- (3 < i < 6) are determined. The 
procedure of constructing this differential characteristic is briefly described as 
follows. 



AX, AZ, AX 4 AZ, AX, 


Fig. 9. Differential characteristics with [h,h,h,l3\ = [4, 2,0, 4] and x(AX 2, 0 ) = 
x(AX 2,2) = 1. Gray squares denote leaked bytes. Squares marked with broken line 
are active, squares marked with backslash should be chosen to satisfy some conditions, 
and empty squares have no difference. 


1 . We start at the MC step of Round 1 here, and choose nonzero 0 i 
and 02 such that one of 03,...,0e is zero, where (0s, 04, 0 s, 0@ )* = 
MC -1 • (01,0,02,0 )*■ Thus, for arbitrary 0 \ 7^ 0 , we can choose 02 £ 
{D~ 1 2 3 4 5 6 E 0 i,B~ 1 90 i,E~ 1 D 0 i, 9 ~ 1 B 0 i}. 03 , , 0 q are obtained once 0 i and 
02 are determined. Notice that we have 4 choices of 02 for each 0 i 7^ 0 . 

2 . Compute a\ and rj 2 using the pair (X-2fl, 0 :i ) and (X^,/^), respectively. 

3 . Compute 0-2, . . . , by solving (<*4, 0 , 015, 0 ) = (ai, a.2, 0 , 0:3) -MG*; Compute 
» 7 i> V3, fU and 775 by solving (774, 0 , 775, 0 ) = ( 0 , 771,7/2, m) ■ MC*. 

4 . Choose (7x1,712) (resp. (71,72)) such that Pr(/z 1 — > a 2) • Pr(/x 2 — t 773) 7^ 0 
(resp. Pr( 71 — >- 771) • Pr( 72 — t 03) 7^ 0 ) and one of 7x4 and 71 6 (resp. 74 and 
76) is zero. Choose {v\, 122) (resp. (07, 02)) such that Pr(a 4 — >• zq) ■ Pt^t/s — »■ 
1Z2) 7^ 0 (resp. Pr(?74 — > 07) • Pr(a 5 — >• 62) 7^ 0 ) and one of V4 and (resp. 
<5^4 and 5 g) is zero. 

5 . Compute /X3, 7x5, 73 and j' 5 using the pair (^1,4, 7x3), (Xi^, 7x5), (Xi^, 73) 
and (Xi i 6 ,75), respectively. Choose 0 - (3 < * < 6) such that Pr (/ % — » 
0 i) = 2 -6 if 0 i 7^ 0 or 0 \ = 0 if 0 i = 0 ; Choose 7 x( (i = 4 , 6) such that 
Pr(/x( — > m) = 2 -6 if Ttj 7^ 0 or tx( = 0 if Hi = 0 ; Choose 7I ( i = 4 , 6) such 
that Pt-(t- -> 74) = 2 -6 if 7* 7^ 0 or 7I = 0 if 7* = 0. 

6. Compute v' 3 , u' 5 , u 3 and 0-5 using the pair (X 40 , z/ 3 ), (Xi^, ^5), (X^s, 0-3) and 
(X4 } io, 05) respectively. Choose u[ (i = 4 , 6 ) such that Pr(p| — > u 0 ) = 2 -6 if 
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Vi ^ 0 or v[ = 0 if Vi = 0; Choose a\ {i = 4, 6) such that Pr({7- -> cr*) = 2 6 
if (Ji 7^ 0 or cr' = 0 if cr* = 0. 

7. Compute AX r+4 = MC(AZ 4 ). 

The existence of these differential characteristics is only related to the existence 
of pairs ( /xi , 1 ^ 2 ), { 11 ,^ 2 ), and (ay, 02) in step 4. Based on the exper- 

imental results given in the construction of Fig. [SJ they always exist. Thus, 
we have 255 x 4 = 1020 differential characteristics here because has 255 
choices and ^2 has four choices for each f} 4 . The average probability of them is 

2-6-7-13.03-2-13.04-2 _ 2 -94.1 

E Details of One Forgery in the “2—8—12—4” Experiment 

The initial state is: Cte7745/e4/a948d«9. 



Fig. 10. Differential Path of type “2-8-12-4”. The hexadecimal numbers indicate the 
difference values. The empty squares indicate there is no difference. The squares of 
leaked bytes are marked with gray color. 


Table 7. The values of round keys 



Round 1 

Round 2 

Round 3 

Round 4 

Block 1 

0z27de696c86fcc6a71 

0a:0eda00/69a70d28/ 

0a:caa2ca&4/&3c/8a8 

0x8034/88c57ed2766 

Block 2 

0i69cac/23/6387dd8 

Ctee9d293e0d9550016 

Cte75376aeca8ed970e 

(teelc9150ac5564aad 


F Details of one Forgery in the “6— 4— 6— 9” Experiment 

The initial state is: Cte92304e6d967c7373. 


Table 8. The forgery attack on the “2-8-12-4” differential characteristic 



Plaintext 

Ciphertext 

Forged Ciphertext 

Colliding State 

Block 1 

Cte37<fc069161450099 

Cte6c2636071e45d85d 

0x6c6636071e35d85d 

0z623d4/8ee691al3e 

Block 2 

Cte&1469433d739a810 

0z39d7ac987dd694a8 

0z53bal02c0dl64435 
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Fig. 11. Differential Path of type “6-4-6-9”. The hexadecimal numbers indicate the 
difference values. The empty squares indicate there is no difference. The squares of 
leaked bytes are marked with gray color. 


Table 9. The values of round keys 



Round 1 

Round 2 

Round 3 

Round 4 

Block 1 

0x60ee23ea2d7054dd 

0xc/849ed86e6774c0 

0x569d49934668a/00 

0x64601c65561255c8 

Block 2 

0x36a5467<fc8e6e9d2 

0x6e9da2683ae39382 

0x724461aa616e86e2 

0xa396ceccaa9d57 /6 

Table 10. The forgery attack on the “6- 

-4-6-9” differential characteristic 


Plaintext 

Ciphertext 

Forged Ciphertext 

Colliding State 

Block 1 

0xl82841a869/5e890 

0x7660dcele61d0d43 

0x06c0d7e8361d0d41 

0x/134343/a5620472 

Block 2 

0x35&d62a519a0818/ 

0xa3398a6/cd7/cdld 

0:r646cac5a462/92a8 
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Abstract. We present the Protected-IV construction (PIV) a simple, 
modular method for building variable-input-length tweakable ciphers. At 
our level of abstraction, many interesting design opportunities surface. 
For example, an obvious pathway to building beyond birthday-bound 
secure tweakable ciphers with performance competitive with existing 
birthday-bound-limited constructions. As part of our design space ex- 
ploration, we give two fully instantiated PIV constructions, TCTi and 
TCT2; the latter is fast and has beyond birthday-bound security, the 
former is faster and has birthday-bound security. Finally, we consider a 
generic method for turning a VIL tweakable cipher (like PIV) into an 
authenticated encryption scheme that admits associated data, can with- 
stand nonce-misuse, and allows for multiple decryption error messages. 
Thus, the method offers robustness even in the face of certain sidechan- 
nels, and common implementation mistakes. 

Keywords: tweakable blockciphers, beyond-birthday-bound security, 
authenticated encryption, associated data, full-disk encryption. 


1 Introduction 

The main contribution of this paper is the Protected-IV construction (PIV), see 
Figure [lj PIV offers a simple, modular method for building length-preserving, 
tweakable ciphers that: 

(1) may take plaintext inputs of essentially any length; 

(2) provably achieves the strongest possible security property for this type of 
primitive, that of being a strong, tweakable-PRP (STPRP); 

(3) admit instantiations from n-bit primitives that are STPRP-secure well be- 
yond the birthday-bound of 2"/ 2 invocations. 

Moreover, by some measures of efficiency, beyond-birthday secure instantiations 
of PIV are competitive with existing constructions that are only secure to the 
birthday bound. (See Table [TJ) We will give a concrete instantiation of PIV 
that has beyond birthday-bound security and, when compared to EME [IB] , the 
overhead is a few extra modular arithmetic operations for each n-bit block of 
input. 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 405-gS3] 2013. 
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Tweakable ciphers with beyond 
birthday-bound security may have 
important implications for crypto- 
graphic practice. For example, in 
large-scale data-at-rest settings, where 
the amount of data that must be pro- 
tected by a single key is typically 
greater than in settings where keys 
can be easily renegotiated. 

At least two important applications 
have already made tweakable ciphers 
their tool-of-choice, namely full-disk 
encryption (FDE) and format- 
preserving encryption (FPE). Our 
work provides interesting new results 
for both FDE and FPE. Fig. 1. The PIV[F, V] tweakable cipher. In- 

We also show that tweakable ci- P ul T is the tweak, and X = Xl || Xr is a 
phers enable a simple mechanism bit string, where \Xl\ = N and Xr is any 
for building authenticated encryp- length accepted by V . The filled-in box is 
tion schemes with associated data the tweak input - 

(AEAD), via an extension of the encode-then-encipher approach of Bellare and 
Rogaway [1] . This approach has some practical benefits, for example, it securely 
handles the reporting of multiple types of decryption errors. It can also eliminate 
ciphertext expansion by exploiting any existing nonces, randomness, or redun- 
dancies appearing in either the plaintext or associated data inputs. Combined 
with our other results, encode-then-encipher over PIV gives a new way to build 
AEAD schemes with beyond birthday-bound security. 



Background. Tweakable blockciphers (TBCs) were introduced and formalized 
by Liskov, Rivest and Wagner [20] . An n-bit TBC A is a family of permutations 
over {0, l} n , each permutation named by specifying a key and a tweak. In typical 
usage, the key is secret and fixed across many calls, while the tweak is not 
secret, and may change from call to call; this allows variability in the behavior 
of the primitive, even though the key is fixed. A tweakable cipheiQ is the natural 
extension of a tweakable blockcipher to the variable-input-length (VIL) setting, 
forming a family of length-preserving permutations. 

Since the initial work of Liskov, Rivest and Wagner, there has been substantial 
work on building tweakable ciphers. Examples capable of handling long inputs 
(required for FDE) include CMC [15], EME p6], HEH [30], HCH [TO], and 
HCTR [33] . Loosely speaking, the common approach has been to build up 
the VIL primitive from an underlying n-bit blockcipher, sometimes in concert 
with one or more hashing operations. The security guaranteed by each of these 
constructions become vacuous after about 2”/ 2 bits have been enciphered. One of 


1 Sometimes called a “tweakable enciphering scheme” , or even a “large-block cipher” . 
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our main goals is to break through this birthday bound, i.e., to build a tweakable 
cipher that remains secure long after 2"/ 2 bits have been enciphered. 

The PIV construction. To this end, we begin by adopting a top-down, composi- 
tional viewpoint on the design of tweakable ciphers, our PIV construction. It is 
a type of three-round, unbalanced Feistel network, where the left “half” of the 
input is of a fixed bit length N, and the right “half” has variable length. The 
first and third round-functions are an iV-bit tweakable blockcipher (F), where N 
is a parameter of the construction, e.g. N = 128 or N = 256. The middle round- 
function (V) is itself a VIL tweakable cipher, whose tweak is the output of first 
round. 

It may seem as^ though little has been accomplished, since we need a VIL 
tweakable cipher V in order to build our VIL tweakable cipher PIV[F, V]. How- 
ever, we require substantially less of V than we do of PIV[F, V], In particular, 
the target security property for PIV is that of being a strong tweakable pseu- 
dorandom permutation. Informally, being STPRP-secure means withstanding 
chosen- ciphertext attacks in which the attacker also has full control over all 
inputs. The attacker can, for example, repeat a tweak an arbitrary number of 
times. Our PIV security theorem (Theorem [T]) says the following: given (1) a 
TBC F that is STPRP-secure over a domain of V-bit strings, and (2) a tweak- 
able cipher V that is secure against attacks that never repeat a tweak, then the 
tweakable cipher PIV[F, V] is STPRP-secure. Thus, qualitatively, the PIV con- 
struction promotes security (over a large domain) against a restricted kind of 
attacker, into security against arbitrary chosen-ciphertext attacks. 

Quantitatively, the PIV security bound contains an additive term q 2 / 2 N , 
where q is the number of times PIV is queried. Now, N might be the block- 
size n of some underlying blockcipher; in this case the PIV composition delivers 
a bound comparable to those achieved by existing constructions. JBut N~= 2n 
presents the possibility of using an n-bit primitive to instantiate F and V , and 
yet deliver a tweakable cipher with security well beyond beyond-birthday of 2"/ 2 
queries. 

As a small, additional benefit, the PIV proof of STPRP-security is short and 
easy to verify. 

Impacts of modularity on instantiations. Adopting this modular viewpoint al- 
lows us to explore constructions of F and V independently. This is particularly 
beneficial, since building efficient and secure instantiations of VIL tweakable ci- 
phers (V) is relatively easy, when tweaks can be assumed not to repeat. The 
more difficult design task, of building a tweakable blockcipher (F) that remains 
secure when tweaks may be repeated, is also made easier, by restricting to plain- 
text inputs of a fixed bit length N. In practice, when (say) N = 128 or 256, 
inefficiencies incurred by F can be offset by efficiency gains in V . 

To make thing concrete, we give two fully-specified PIV tweakable ciphers, 
each underlain by n-bit blockciphers. The first, TCTi, provides birthday-bound 
security. It requires only one blockcipher invocation and some arithmetic, modulo 
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Fig. 2. Security bounds for TCTi, EME and TCT 2 , all using an underlying 128-bit 
primitive and 4096-byte inputs, typical for FDE. The EME curve is representative of 
other prior constructions. 


a power of two, per n-bit block of input. In contrast, previous modes either 
require two blockcipher invocations per n-bit block, or require per-block finite 
field operations. 

The second, TCT 2 , delivers security beyond the birthday-bound. When com- 
pared to existing VIL tweakable ciphers with only birthday-bound security, like 
EME* construction, TCT 2 incurs only some additional, simple arithmetic op- 
erations per n bit block of input. Again, this arithmetic is performed modulo 
powers of two, rather than in a finite field. 

In both TCTi and TCT 2 , the VIL component is instantiated using counter- 
mode encryption, but over a TBC instead of a blockcipher. The additional tweak 
input of the TBC allows us to consider various ‘tweak-scheduling’ approaches, 
e.g. fixing a single per-message tweak across all blocks, or changing the tweak 
each message blockO We will see that the latter approach of re-tweaking on a 
block-by-block basis leads to a beyond birthday-bound secure PIV construction 
that admits strings of any length at least N. 

AEAD via encode-then- (tweakable) encipher. The ability to construct beyond 
birthday-bound secure tweakable ciphers with large and flexible domains moti- 
vates us to consider their use for traditional encryption. Specifically, we build 
upon the “encode-then-encipher” results of Bellare and Rogaway [1] . They show 
that messages endowed with randomness (or nonces) and redundancy do not 
need to be processed by a authenticated encryption (AE) scheme in order to 

2 There is a natural connection between changing the tweak of a TBC, and changing 
the key of a blockcipher. Both can be used to boost security, but the former is cleaner 
because tweaks do not need to be secret. 
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enjoy privacy and authenticity guarantees; a VIL strong-PRP suffices. This is 
valuable when typical messages are short, as there is no need to waste bandwidth 
upon transmitting an AE scheme’s IV and a dedicated authenticity tag. 

We find that the tweakable setting gives additional advantages to the encode- 
then-encipher approach. An obvious one is that the tweak empowers support 
for associated data. More interesting, one can explore the effects of randomness, 
state or redundancy present in the message and tweak inputs. We find that 
randomness and state can be shifted to from the message to the tweak without 
loss of security, potentially reducing the number of bits that must be processed 
cryptographically. 

We also find that AEAD schemes are built this way, via encode-then-encipher 
over a tweakable cipher, can accommodate multiple decryption error messages. 
Multiple, descriptive error messages can be quite useful in practice, but have 
often empowered damaging attacks (e.g. padding-oracle attacks [321712711112] j. 
These attacks don’t work against our AEAD schemes because, loosely, changing 
any bit of a ciphertext will randomize every bit of the decrypted string. 

Our work in this direction suggests useful implications for FPE [315] , and for 
layered-encryption schemes, for example the onion-encryption scheme used by 
Tor [23]. 

Due to space limitations, we refer the reader to the full version of this paper 
for our results on AEAD, and a discussion of their potential impacts. 


Related work. Here we give a much abbreviated discussion of other related work. 
Please refer to Table [T] for a summary comparison of T CT i , T CT 2 with other 
constructions. A more complete discussion will appear in the full version. 

Researchers have developed three general approach for constructing tweakable 
ciphers from n-bit blockciphers. Each approach has yielded a series of increas- 
ingly refined algorithms. The first, Encrypt-Mask-Encrypt, places a light-weight 
“masking” layer between two encryption layers; examples include CMC [TS] and 
EME* [T3] . The second, Hash-ECB-Hash, sandwiches ECB-mode encryption be- 
tween two invertible hashes. PEP [9], TET [14], and HEH [30131] are examples. 
Finally, Hash-CTR-Hash uses non-invertible hashes with CTR-mode encryption. 
Both HCH [ID] and HCTR [33] use this approach. Mancillas-Lopeze et al. [33] 
report on the hardware performance of most of these modes. Chakraborty et 
al. [5] discuss implementations of the more recent HEH [3D] construction and its 
refinement E2, which halves the number of finite field multiplications. 

We contribute a new, top-down approach that leads us to the first beyond- 
birthday-bound secure tweakable cipher suitable for encrypting long inputs (i.e., 
longer than the blocksize of an underlying blockcipher) . Tableland Figure [3] 
compare some of these algorithms with our new TCTi and TCT2 constructions 
in terms of computational cost and security, respectively. Note that the finite 
field operations counted in Table Q] take hundreds of cycles in software [2112] , 
whereas their cost relative to an AES blockcipher invocation is much lower in 
hardware [33]. TCTi is the first tweakable cipher to require only a single block- 
cipher invocation and no extra finite field multiplications for each additional n 
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Table 1. Tweakable ciphers and their computational costs for In- bit inputs. Costs 
measured in n-bit blockcipher calls [BC], finite field multiplications pF 2 n x], and ring 
operations [Z w +\ and [Z 2to ], for some word size w. Typically, l = 32 for FDE, and we 
anticipate n = 128, w = 64. 

Cost 


Cipher 

'[BC] 

[F 2 nX] 


[Z 2w ] 

Ref. 

HCTR 

,t 

2£ + 2 

- 

- 

[33] 

CMC 

2£ + l 

- 

- 

- 

m 

EME 

2£ + l 

- 

- 

- 


EME* 

2£ + 3 

- 

- 

- 

m 

PEP 

£ + 5 

4£ — 6 

- 

- 

m 

HCH 

£ + 3 

2£ — 2 

- 

- 

US] 

TET 

l 

21 

- 

- 

M 

HEH 

i+1 

£ + 2 

- 

- 

3MB 

TCTi 

£ + 1 

5 

2*(S) 


> 2 - 

tct 2 

2£ + 8 

32 



) 2 - 


bits of input, while TCT 2 is the first to provide beyond-birthday-bound security 
(and still gets away with a fixed number of finite field multiplications). 

We mention the LargeBlock constructions due to Minematsu and Iwata [25], 
since they provide ciphers with beyond-birthday-bound security. These do not 
support tweaking, but it seems plausible that they could without significant 
degradation of performance or security. These constructions overcome the birth- 
day bound by using 2n-bit blockciphers as primitives, which are in turn con- 
structed from an n-bit TBC. To our knowledge, CLRW2 [19] is the most efficient 
n-bit TBC with beyond-birthday-bound security that supports the necessary 
tweakspace (Minematsu’s TBC [21] limits tweak lengths to fewer than n/2 bits). 
Compared to T CT 2 , instantiating the LargeBlock constructions with this prim- 
itive ultimately requires an extra six finite field multiplications for each n bits 
of input. Thus, we suspect the LargeBlock designs would be impractical even if 
adding tweak support proves feasible. 

A construction due to Coron, et al. m, which we refer to as CDMS (after the 
authors), builds a 2n-bit TBC from an n-bit TBC, providing beyond-birthday- 
bound security in n. Like PIV, CDMS uses three rounds of a Feistel-like structure. 
However, our middle round uses a VIL tweakable cipher, and we require a weaker 
security property from the round. This allows PIV to efficiently process long in- 
puts. That said, CDMS provides an excellent way to implement a highly-secure 
2n-bit TBC, and we will use it for this purpose inside of TCT 2 to build F. (Nest- 
ing CDMS constructions could create (2 m n)-bit tweakable blockciphers for any 
m > 1, but again, this would not be practical). We note that Coron, et al. were 
primarily concerned with constructions indifferentiable from an ideal cipher, a 
goal quite different from ours. 

The Thorp shuffle [2B] and its successor, swap-or-not [17] , are highly-secure 
ciphers targeting very small domains (e.g., {0,1}” for n < 64). Swap-or-not 
could almost certainly become a VIL tweakable cipher, without changing the 
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security bounds, by using domain separation for each input length and tweak in 
the underlying PRF. Essentially, one would make an input-length parameterized 
family of (tweakable) swap-or-not ciphers, with independent round- keys for each 
length. While still offering reasonable performance and unmatched security for 
very small inputs, the result would be wildly impractical for the large domains 
we are considering: swap-or-not ’s PRF needs to be invoked at least 66 times to 
securely encipher a 6-bit input (below that, the bound becomes vacuous against 
even q = 1 query), and disk sectors are often 4096 bytes. Also, to match TCT 2 ’s 
security, the PRF itself would need to be secure beyond the birthday bound 
(with respect to n). 

Finally, we note that Rogaway and Shrimpton [21] considered some forms of 
tweakable encode-then-encipher in the context of deterministic AE (“keywrap- 
ping”), and our work generalizes theirs. 


2 Tweakable Primitives 

Preliminary notation. Let N = {0,1,2,...} be the set of non- negative integers. 
For n G N, {0, 1}" denotes the set of all n-bit binary strings, and {0, 1}* denotes 
the set of all (finite) binary strings. We write e for the empty string. Let s, t G 
{0, 1}*. Then |s| is the length of s in bits, and |(s, t)| = |s || t|, where s || t denotes 
the string formed by concatenating s and t. If s G (0, l} nm for some m G N, 
S 1 S 2 ■ ■ ■ s m ^ s indicates that each Sj should be defined so that |sj| = n and 
s = S 1 S 2 • • • s m . When n is implicit from context, it will be omitted from the 
notation. If s = bib 2 ■ ■ ■ b rl is an n-bit string (each 6j G {0, 1}), then s[i..j] = 
6*6i_|_x • • • bj, s[i..] = s[i..n], and s[..j] = s[l..j]. The string s®t is the bitwise 
xor of s and i; if, for example, |s| < |i|, then s CD t is the bitwise xor of s and 
t[.. |s|]. Given J?CN and n G N with n < min(R), {0, l} fl = UieniO) 1}*; and 
by abuse of notation, {0, \} R ~ n = {0, 1}'~”. Given a finite set X, we write 

X <— X to indicate that the random variable X is sampled uniformly at random 
from X. Throughout, the distinguished symbol J_ is assumed not to be part of 
any set except {T}. Given an integer n known to be in some range, (n) denotes 
some fixed- length (e.g., 64-bit) encoding of n. 

Let H : K, X D — >• TZ C {0, 1}* be a function. Writing its first argument 
as a subscripted key, H is e-almost universal (e-AU) if for all distinct I,F G 
D, Pr [ H k (X) = H k (Y) ] < e (where the probability is over K -A 1C). Simi- 
larly, H is e-almost 2-XOR universal if for all distinct X, Y G T> and C G 1Z, 
Pr [H K {X)®H K {Y) = C}<e. 

An adversary is an algorithm taking zero or more oracles as inputs, which it 
queries in a black-box manner before returning some output. Adversaries may 
be random. The notation A? =>b denotes the event that an adversary A outputs 
6 after running with oracle / as its input. 

Syntax. Let 1C be a non-empty set, and let T,X C {0, 1}*. A tweakable cipher is 
a mapping E:ICxTxX—tX with the property that, for all (K, T) G K. X T, 
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E(K,T,-) is a permutation on A. We typically write the first argument (the 
key) as a subscript, so that E K {T,X) = E(K,T,X). As E K (T,-) is invertible, 
we let Ek 1 (T, •) denote this mapping. We refer to K, as the key space, T as the 
tweak space, and X as the message space. We say that a tweakable cipher E 
is length preserving if \Ek(T,X)\ = |X| for all X € X, T € T, and K € 1C. 
All tweakable ciphers in this paper will be length preserving. Restricting the 
tweak or message spaces of a tweakable cipher gives rise to other objects. When 
X = {0, 1}” for some n > 0, then A is a tweakable blockcipher with blocksize n. 
When |T| = 1, we make the tweak implicit, giving a cipher E: K x X — i X, 
where Ek{-) is a (length-preserving) permutation over X and E^ 1 is its inverse. 
Finally, when X = {0, l} n and |T| = 1, we have a conventional blockcipher 
E : JC x {0, 1}" — 1 {0, l} n . 

Security notions. Let Perm (A) denote the set of all permutations on X. Simi- 
larly, we define BC(/C, X) be the set of all ciphers with keyspace K. and message 
space X. When X, X' are sets, we define Func(A, X') to be the set of all functions 
f-.X^-X'. 

Fix a tweakable cipher E: K. xT x X — > X. We define the strong, tweak- 
able pseudorandom-permutation (STPRP) advantage measure as Adv^ rp (A) = 
Pr [ K JC : ) ^ 1 j _ Pr [ n <L BC (T, X) : A n kA^~H-A ^ X j . The 

TPRP advantage measure is defined analogously, by dropping the E^ 1 ora- 
cle from the first probability, and the 77“ 1 oracle from the second. We as- 
sume that A never makes pointless queries. By this we mean that for the 
(S)TPRPexperiments, the adversary never repeats a query to an oracle. For 
the STPRP advantage measure, this also means that if A queries (T, X) to its 
leftmost oracle and receives Y in return, then it never queries (T, Y) to its right- 
most oracle, and vice versa. These assumptions are without loss of generality. 

The strong, indistinguishable-from-random-bits (SRND) advtantage is de- 
fined as Advf“ d (A) = Pr [k£K: ) i j — Pr [ ^ 1 ] , 

where the $(•>•) oracle always outputs a random string equal in length to its 
second input: |$(T, A)| = |X| for all T and X. As before, we assume that A 
never makes a pointless query. Here, these assumptions are not without loss of 
generality, but instead prevent trivial wins. Adversaries for the (S)TPRP and 
SRND advantages are nonce-respecting if the transcript of their oracle queries 
(Ti, Xi), . . . , (Tq, X g ) does not include T) = Tj for any i ^ j. 

For a cipher E: K x X A, we define the strong, pseudorandom per- 
mutation (SPRP) advantage as Adv^ rp (A) = Pr [ K +-K . : a Ek A' b k 1 A =>ij_ 
Pr -A Perm (A) : A 7r( ' ) ,7r As above, the PRP advantage is defined 

analogously, by dropping the E^ 1 oracle from the first probability, and the 7r _1 
oracle from the second. We again assume (without loss of generality) that the 
adversary does not make pointless queries. 

For all security notions in this paper, we track three adversarial resources: the 
time complexity t, the number of oracle queries q, and the total length of these 
queries p. The time complexity of A is defined to include the complexity of its 
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enveloping probability experiment (including sampling of keys, oracle computa- 
tions, etc.), and we define the parameter t to be the maximum time complexity 
of A, taken over both experiments in the advantage measure @ 

3 The PIV Construction 

We begin by introducing our high-level abstraction, PIV, shown in Figure [TJ 
Let T = {0, 1}* for some t > 0, and let 3 ; C {0, 1}* be such that if Y G y, 
then {0,l}' y ' C y. Define T' = T X Fix an integer N > 0. Let F: 1C X 
T’ x {0, 1} ,V — > {0, l} iV be a tweakable blockcipher and let V : K, x {0, 1}' ¥ x 
y — >• JkJje a tweakable cipher. From these, we produce a new tweakable cipher 
PIV[F, V] : ( K. ' x 1C) x T x X — > A, where X = {0, 1}^ x y. As shown in 
Figure (TJ the PIV composition of F,V is a three-round Feistel construction, 
working as follows. On input (T, X), let X = X L || X R where \X L \ = N . First, 
create an jV-bit string IV = F K ’(T || X R ,X L ). Next, use this IV to encipher 
X R , creating a string Y R = Vk(IV,X r ). Now create an TV-bit string Y R = 
F k ,(T || Yr,IV), and return Y L || Y R as the value of PI \/[F,V] K > iK (T,X). The 
inverse PIV[F, V]^) K (T, Y) is computed in the obvious manner. 

At first glance, it seems that nothing interesting has been accomplished: we 
took an TV-bit TBC and a tweakable cipher, and produced a tweakable cipher 
with a slightly larger domain. However, the following theorem statement begins 
to surface what our abstraction delivers. 

Theorem 1. Let sets T, 3^ T', X and integer N be asjibove. Let F: K/ x T 7 x 
{0, 1}^ — »• {0, 1}^ be a tweakable blockcipher, and letV:!Cx {0, 1}^ X 3^ — t Y 
be a tweakable cipher. Let PIV[F, V] be as just described. Let A be an adversary 
making q < 2 N / 4 queries totaling p bits and running in time t. Then there exist 
adversaries B and C, making q and 2 q queries, respectively, and both running in 
0(t ) time such that Adv®^- < Adv~ nd (H) + Advp >rp (C)-|-|^ r , where B 
is nonce-respecting and whose queries total p — qN bits in length. 

The first thing to notice is that the VIL portion of the PIV composition, V, need 
be SRND-secure against nonce-respecting adversaries only. As we will see in the 
next section, it is easy to build efficient schemes meeting this requirement. Only 
the FIL portion, F, needs to be secure against STPRP adversaries that can 
use arbitrary querying strategies. Thus the PIV composition promotes nonce- 
respecting security over a large domain into full STPRP security over a slightly 
larger domain. 

The intuition for why this should work is made clear by the picture. Namely, 
if F is a good STPRP, then if any part of T or X is “fresh”, then the string 

3 We do this simply to make our theorem statements easier to read. A more explicit 
accounting of time resources in reductions, e.g. separating the running time of A 
from the time to run cryptographic objects “locally” , would not significantly alter 
any of our results. 
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IV should be random. Hence it is unlikely that an IV value is repeated, and 
so nonce-respecting security of the VIL component is enough. Likewise when 
deciphering, if any part of T, Y is “fresh” . 

The term Aq 2 /2 N accounts for collisions in IV and the difference between F 
and a random function. This is a birthday-bound term in N, the blocksize of F. 
Since most TBC designs employ (one or more) underlying blockciphers, we have 
deliberately chosen the notation N, rather than n, to stress that the blocksize of 
F can be larger than that of some underlying blockcipher upon which it might be 
built. Indeed, we’ll see in the next section that, given an n-bit blockcipher (and 
a hash function), we can build F with N = 2n. This gives us hope of building 
beyond birthday-bound secure VIL STPRPs in a modular fashion; we will do 
so, and with relatively efficient constructions, too. 

It will come as no surprise that, if one does away with the lower F invocation 
and returns IV || Yr, the resulting composition does not generically deliver a 
secure STPRP. On the other hand, it is secure as a TPRP (just not a strong 
TPRP). This can be seen through a straight-forward modification of the PIV 
security proof. 


4 Concrete Instantiations of PIV 

Instantiating a PIV composition requires two objects, a (fixed-input-length) 
tweakable blockcipher F with an N- bit blocksize, and a variable-input-length 
tweakable cipher V. In this section we explore various ways to instantiate these 
two objects, under the guidance of Theorem [1] and practical concerns. 

Theorem [T] suggests setting N to be as large as possible, so that the final term 
is vanishingly small for any realistic number of queries. But for this to be useful, 
one must already know how to build a TBC F with domain (0, 1}^ for a large N, 
and for which Adv~ >rp (C) approaches q 2 / 2 N . To our knowledge, there are no 
efficient constructions that permit Adv~ irp (C') to be smaller than 0(q 3 / 2 2 ") 
when using an n-bit blockcipher as a starting point. (A recent result by Lampe 
and Seurin [18] shows how to beat this security bound, but at a substantial 
performance cost.) A construction by Coron, et al., which will be discussed in 
more detail shortly, does meet this bounc0 while providing N = 2n. 

So we restrict our attention to building TBC F with small N. In particular, 
we follow the common approach of building TBCs out of blockciphers. Letting n 
be the blockcipher blocksize, we will consider N = n, and N = 2 n. In the former 
case, Theorem Q] only promises us security up to roughly q = 2"/ 2 , which is 
the birthday bound with respect to the blockcipher. With this security bound 
in mind, we can use simple and efficient constructions of both F and the VIL 
tweakable cipher V. On the other hand, when N = 2n. Theorem [T] lets us hope 
for security to roughly q = 2" queries. To realize this hope we will need a bit 

4 However, nesting this construction to provide a VIL tweakable cipher is prohibitively 

inefficient. 
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more from both F and V, but we will still find reasonably efficient constructions 
delivering beyond birthday bound security. 

In what follows, we will sometimes refer to objects constructed in other works. 
These are summarized for convenience in Figure [5] found in Appendix 0 

An efficient VIL tweakable cipher. We will start by considering general methods 
for constructing the VIL tweakable cipher, V. Recall that V need only be secure 
against adversaries that never repeat a tweak. In Figure |31 we see^an analogue 
of conventional counter-mode encryption, but over an n-bit TBC E instead of a 
blockcipher. Within a call (T, X) to TCTR, each n-bit block Xi of the input X is 


procedure TCTR[A]jr(T, X): 

procedure TCTR [E] (T,Y): 

Xi,X 2 ,...,X v ^X 
for 1=1 to v 

Ti <— g(T,i)-, Zi <— (i) 

Yi <— E K {Ti, Zf) © Xi 

Return Vi, Y 2 ,....Y U 

Yi,Y 2 ...,Y v ^Y 

Xi <- Yi © E K {Ti, Zi) 

Return Xi , . . . , X v 


Fig. 3. The TCTR VIL tweakable cipher 


processed using a per-block tweak T,-, this being determined by a function g : T'x 
N — > T of the input tweak T and the block index i. 

Consider the behavior of TCTR when g(T,i ) = T. The following result is 
easily obtained using standard techniques. 


Theorem 2. Let E : {0, l} fc xT x {0, 1}" —X {0, 1}" be a tweakable blockcipher, 
and let TCTR[£l]x and TCTRfF?]^. 1 be defined as above, with g(T,i ) =T e T. 
Let A be a nonce-respecting adversary that runs in time t, and asks q queries, 
each of length at most In bits (so, p< qtn). Then for some adversary B making 
at most q£ queries and running in time 0(t), (A) < Adv~ rp (R) + 

0.5q£ 2 /2 n . 

We note that the bound displays birthday- type behavior when £ = o(^/q), and 
is tightest when £ is a small constant. An important application with small, 
constant £ is full-disk encryption. Here plaintexts X would typically be 4096 
bytes long, so if the underlying TBC has blocksize n = 128, we get £ = 256 
blocks @ 

Extending tweakspaces. In PIV, the TBC F will need to handle long tweaks. 
Fortunately, a result by Coron, et al. m shows that one can compress tweaks 

3 Actually, slightly less than this when used in the PIV composition, since the first N 
bits are enciphered by F. 
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Fig. 4 . The TCT2 construction (top). TCT2 takes rn-bit tweaks, and the input length 
is between 2 n and in bits, inclusive. Here, F is implemented using the 2n-bit CDMS 
construction coupled with the NH hash function (bottom left). Both V and the TBC 
E used inside of CDMS are implemented using CLRW2[polyH 7 '". E] (bottom right), 
with r = 6 and r = 2, respectively. The function Pad maps s to s || 10 ^ + P 71_1_ bl. In 
the diagram for CDMS, the strings 00T, 01T, and 10T are padded with Os to length 
5n before being used. 


using an e-AU hash function at the cost of adding a q 2 e term to the tweakable 
cipher’s TPRP security bound. In particular, we will use (a slight specialization 
of) the NH hash, defined by Black, et al. [5]; NH[r, s]l takes r-bit keys (\L\ = r), 
maps r-bit strings to s-bit_ strings, and is 2 s / 2 -AU. Please see Table [5] for the 
description. Given a TBC E, A NH denotes the resulting TBC, whose tweakspace 
is now the domain of NH, rather than its range. 

4.1 Targeting Efficiency at Birthday- Type Security: TCTi 

Let us begin with the case of N = n. To instantiate the n-bit TBC F in PIV we 
refer to the pioneering TBC work of Liskov, Rivest and Wagner [20] , from which 
we draw the LRW2 TBC; please refer to Figure [5] for a description. 

Before we give the TCTi construction, a few notes. In Figure [5] we see that 
in addition to a blockcipher E, LRW2 [H,E] uses an e-AXU 2 hash function, H. 
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and so, in theory, it could natively accommodate large tweaks. But for practical 
purposes, it will be more efficient to implement LRW2 with a small tweakspace, 
and then extend this using a fast e-AU hash function @ For the e-AXU 2 hash 
function itself, we use the polynomial hash polyH (also described in Table [5]). 

Now are ready to give our T CT i construction, which is birthday-bound secure 
for applications with small plaintext messages (e.g. FDE). 


The TCTi Construction. Fix k,n> 0, and let N = n. Let E : {0, l} fe X (0, 1}" -> 
{0, 1}” be a blockcipher, and let polyH m ", and NH be as defined in Tabled Then 
define TCTi = PIV[F, V], where to obtain a m-bit tweakspace and domain 

{o, we set: 

1. n-bit TBC F = LRW2 [polyH 2 ", 7 J] NH [(t+r)n. 2 n] , ^ e LRW2 with its tweakspace 
extended using NH. The keyspace for F is {0, l} fc x {0, l} 2 " X {0, l}^ +T - ) ". 
with key K' partitioning into keys for E. polyH 2 ”, and NH[(f + r)n, 2n], 
(Since NH supports only fixed length inputs, we implicitly pad NH inputs 
with a 1 and then as many 0s as are required to reach a total length of 
(i + r)n bits.) The tweakspace for F is {0, i}{°’b 2 .--->(^+‘r-l)n} 

2. VIL tweakable cipher V = TCTR [LRW2 [polyH", E\] with theJTCTR func- 
tion g : {0, 1}" xN->{0, 1}" as g(T, i) = T. The keyspace for V is (0, l} k x 
{0, 1}", with key K partitioning into keys for E and polyH". The tweakspace 
for V is {0, 1}", and its domain is {0, i}f 0)1 >- 

Putting together Theorems 11121 and results from previous works [6I20| , we have 
the following security bound. 

Theorem 3 (STPRP-security of TCTi). Define TCTi as above, and let A 
be an adversary making q < 2"/4 queries and running in time t. Then there 
exist adversaries B and C, both running in time 0(t) and making (£ — l)q and 
2 q queries, respectively, such that Adv^f^ ^ (A) < Adv^ rp (B)-t-Adv i ^’ rp (C)-|- 

32 q 2 , 4g 2 (l-l) 2 
2 n -r 2 n 

The proof appears in the full version. This algorithm requires 2k + (3 + r + £)n 
bits of key material, including two keys for E. As we show at the end of this 
section, we can get away with a single key for E with no significant damage 
to our security bound, although this improvement is motivated primarily by 
performance concerns. 

Thus TCTi retains the security of previous constructions (see Figure |21 for a 
visual comparison), uses arithmetic in rings with powers-of-two moduli, rather 
than in a finite field. This may potentially improve performance in some archi- 
tectures. 

6 Indeed, one can show composing an e-AU hash function with an e'-AXU2 hash 
function yields an (e + e')-AXU2 hash function; however, we prefer to work on a 
higher level of abstraction. 
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4.2 Aiming for beyond Birthday-Bound Security: TCT 2 

Now let us consider the PIV composition with N = 2 n. For the FIL component, 
we can use Coron et al.’s m CDMS construction to get a 2n-bit TBC from an n- 
bit TBC, and implement the latter using the CLRW2, a recent beyond-birthday- 
bound secure construction by Landecker, Shrimpton, and Terashima 0H] . Tabled 
describes both constructions 0 We again extend the tweakspace using NH. (To 
stay above the birthday bound, we set the range of NH to {0, l} 2 "). Ultimately, 
setting F = CDMS[CLRW2] NH is secure against up to around 2 2 "/ 3 queries. 

CLRW2 also givesjrs a way to realize a beyond birthday-bound secure VIL 
component, namely V = TCTR[CLRW2[£, H], at least for l = o{q l / A ). (We’ll 
see how to avoid this restriction, if desired, in a moment.) 

We are now ready to give our second fully concrete PIV composition, TCT 2 , 
targeted at applications that would benefit from beyond birthday-bound secu- 
rity. This algorithm requires us to nest four layers of other constructions, so we 
provide an illustration in Figure 01 Again we emphasize that the (admittedly 
significant) cost of F can be amortized. 

TCT 2 supports m-bit tweaks and has domain {0, i}{ 2n > 2n+:l ’— 

The TCT 2 Construction. Fix k,l, n, r > 0, and let N = 2n. Let E : {0, l} fe x 
{0, 1}" —l (0, 1}" be a blockcipher, and let polyH , and NH be as defined in 
Table 03 Then define TCT 2 = PIV[F, V], where: 

1. F = CDMS [CLRW2[polyH 6n ,U]] NH[( ^ +T_1) "’ 4 " 1 , that is, the 2n-bit TBC 
CDMS [CLRW2 [polyH 6 ", FI]] with its tweakspace extended using NH. The 
keyspace for F is {0, l} 2fc X {0, l} 12 " x (0, ijU+ T_1 ) 7l ) w ith key K' partition- 
ing into two keys for E, two keys for polyH 6 ", and a key for NH[f?n, 4n] . The 
tweakspace for F is {0, l} r ". 

2. V = TCTR [CLRW2 [polyH 2 ”, E }] , with the TCTR function g: {0,1}" X 
N — >- {0, 1}" as g(T, i ) = T. The keyspace for V is {0, l} 2fc x {0, l} 4 " with key 
K partitioning into two keys for E and two keys for polyH 2 " . The tweakspace 
for V is {0, l} 2 ", and its domain is {0, i}{°’b 2 ’--->(^- 2 M_ 

T CT 2 requires 4fc+(^+r+15)n bits of key material. Putting together Theorems[TJ 
03 and results from previous works [6111119] . we have the following security result. 


Theorem 4 (STPRP-security of TCT 2 ). Define TCT 2 as above, and let A 
be an adversary making q queries and running in time t, where 6 q,£q < 2 2 "/4. 
Then there exist adversaries B and C, both running in 0(t) time and making 
(£ — 1 )q and 6 q queries, respectively, such that Adv^^ (A) < 2Adv|, rp (R) + 
2Adv^’ rp (C') + ^ r + + 2 2 n 6 -'il e s q s + ■ 

7 We note that for CDMSfF], we enforce domain separation via E’s tweak, whereas 
the authors of El use multiple keys for E. The proof of our construction follows 
easily from that of the original. 
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Again, the proof appears in the full version. Some of the constants in this bound 
are rather significant. However, as Figure [5] shows, TCT 2 nevertheless provides 
substantially better security bounds than TCTi and previous constructions. 

4.3 Additional Practical Considerations 

Several variations and optimizations on TCTi and TCT 2 are possible. We high- 
light a few of them here. None of these changes significantly impact the above 
security bounds, unless otherwise noted. 

Reducing the number of blockcipher keys. In the case of T CT 1 , we can use a single 
key for both LRW2 instances provided we enforce domain separation through 
the tweak. This allows us to use a single key for the underlying blockcipher, 
which in some situations may allow for significant implementation benefits (for 
example, by allowing a single AES pipeline). One method that accomplishes this 
is to replace LRW2[polyH 2 ", #JNH[(/+I)n,2n] w ith LRW2[polyH 3 ", E]fM and 
LRW2 [polyH", E] with LRW2[polyH 3 ", Here, / is a 2“"-AU function 

with keyspace {0, l} 3 " X (0, l} (n , taking inputs of the form (A, e) (for some 
X e {0, 1}") or (e, Y) (for some Y 6 {0, and outputting a 3n-bit 

string. Let f L (X,e) = 0 2 " || X and f L (e,Y) = 1" || NH[(* + l)n,2n] L (Y). The 
function / described here is a mathematical convenience to unify the signatures 
of the two LRW2 instances, thereby bringing tweak-based domain separation 
into scope; in practice, we imagine the two instances would be implemented 
independently, save for a shared blockcipher key. We note that TCT 2 can be 
modified in a similar manner to require only two blockcipher keys. 

Performance optimizations. If we need only a tweakable (FIL) blockcipher, we 
can use NH[fn, 2n] in place of NH[(I + 1 )n, 2 n] by adjusting our padding scheme 
appropriately. We emphasize that in the TCTR portion, the polyH functions 
only need to be computed once, since each LRW2 invocation uses the same 
tweak. The corresponding optimizations apply to TCT 2 , as well. 

A naive implementation of T CT 2 would make a total 72 finite field multipli- 
cations during the two FIL phases (a result of evaluating polyH 6 " twelve times). 
We can cache an intermediate value of the polyH 6 " hash used inside of CDMS 
(four n-bit tweak blocks are constant per invocation), and this saves 32 finite field 
multiplications. Precomputing the terms of the polynomial hash corresponding 
to the domain-separation constants eliminates 12 more multiplications, leaving 
28 in total. Four more are required during the VIL phase, giving the count of 32 
reported in Table [TJ 

Handling large message spaces. Both T CT 1 and T CT 2 are designed with FDE 
applications in mind. In particular, they require i to be fixed ahead of time, and 
require more than in bits of key material. 

These limitations are a consequence of using the NH hash function; however, 
a simple extension to NH (described by the original authors [5]) accommodates 
arbitrarily long strings. Fix a positive integer r and define NHjr (M 1 M 2 • • • M v ) = 
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NH l (Mi) || NH l (M 2 ) || ••• || NH l (M v ) || <|M| mod rn), where |Af<| = rn for 
i < v, \M V \ < rn, and NH/, abbreviates NH/, [rn. 2N]. Thus defined, NH* is 
2 _JV -almost universal, has domain {0, 1}*, and requires rn bits of key material. 
This modification shifts some of the weight to the polyH hash; we now require 
eight extra finite field multiplications for each additional rn bits of input. As 
long as r > 4, however, we require fewer of these multiplications when compared 
to previous hash-ECB-hash or hash-CTR-hash constructions. 

With these modifications, the final two terms in TCTi’s security bound (The- 
orem 13]) would become 8q 2 /2 n + &)(]q 2 £ 2 /r 2 2 n + 4 q 2 (i — l) 2 /2", where in is 
now the length of the adversary’s longest query, l > 2.5r, and the remaining 
terms measure the (S)PRP security of the underlying blockcipher. We also as- 
sume 2" > rn, so that \M\ mod rn can be encoded within a single n-bit block. 
Although the constant of 600 is large, we note that setting r = 16, for exam- 
ple, reduces it to a more comfortable size — in this case to less than three. The 
bound for TCT 2 changes in a similar manner. (Note that if 2" -2 > rn, we can 
use a single n-bit block for both the tweak domain-separation constants and 
( \M\ mod rn).) 

Beyond birthday-bound security for long messages. When l is not bounded to 
some small or moderate value, T CT 2 no longer provides beyond-birthday-bound 
security. The problematic term in the security bound is q(£— l) 2 /2". To address 
this, we return to TCTR (Figure [3|) and consider a different per-block tweak 
function. 

In particular, g(T,i) = T || (i). In the nonce-respecting case, the underlying 
TBC E is then retweaked with a never-before-seen value on each message block. 
Again, think about what happens when E is replaced by an ideal cipher IT: 
in the nonce-respecting case, every block of plaintext is masked by the output 
of a fresh random permutation^ In other words, every block returned will be 
uniformly random. Thus we expect a tight bound, in this case. Formalizing this 
logic yields the following theorem. 

Theorem 5. Let E : {0, l} fe X T x {0, 1}” — > {0, 1}" be a tweakable blockcipher, 
and let TCTR[F]x and TCTRfF]^- 1 be defined as above, with g: T' x N — > T an 
arbitrary injective mapping. Let A be a nonce-respecting adversary that runs in 
time t, and asks q queries of total length at most p = an bits. Then there exists 
some adversary B making at most a queries and running in time 0(t) such that 
Ad ''Kt R |S,(>l)<Advf(B). 

Consequently, using this variation of TCTR in Theorems |3] and 2] would remove 
the q(i — l) 2 term from the bounds, thereby lifting message length concerns. 
Note that if this change is made, g(T, i) needs to be computed up to l times per 
invocation, rather than just once. This problem may be mitigated by using the 
XEX [25] TBC in place of LRW2, which makes incrementing the tweak extremely 
fast without significantly changing our security bound. 

8 Notice that one could use (say) Z, •<— 0" and the same would be true. We present it 

as Z, <— (i) for expositions! purposes. 
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When the above change are made, TCTi and TCT2 offer efficient tweakable 
ciphers on an unbounded domain, losing security guarantees only after 0 ( 2”/ 2 ) 
(resp., 0(2 2 ”/ 3 )) bits have been enciphered. Finally, we note that one can use 
a conventional blockcipher mode of operation to build the VIL component. We 
report on this in the full version. 
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A Components for TCT1 and TCT2 


LRW2 1201 : Birthday-bound TBC. Needs blockcipher E, e-AXU 2 function H. 
LRW2[H, E](k,l) (T, X) = E K (X © H L (T)) © H L {T) 

CLRW2|19| : TBC with beyond-birthday-bound security. Requires blockcipher E and 
e-AXU 2 function H. 


CLRW2 [H,E] (Kl:K2 ' Ll<L2) (T,X) = 

LRW2 [H, E] {K2 ' L2) (T, LRW 2[H,E\( Ki ,l 1 )(T, X)) 

polyH mn [34) : e-AXU 2 function with domain ({0, l} n ) m and e = m/ 2 n . All operations 
in IF* 2 ri • 

P olyH^ n (TiT 2 ■ ■ ■ T m ) = 0 T t ® L\ 

NR(j/w,2tw) [6]: e-AU hash function with e = l/2 tw . Inputs are vw bits, where v is 
even and w > 0 is fixed. 

nhMkci ••iiK„ +3 ^ t) W = 

|| Hk 3 - k„ + 2 (M) || || H Kat - 1 ..-K v+at . a {M) 

where H Kl „ ... „ Ku (Xi ■ ■ ■ X v ) = + w Y 2i _i) ■ (AT 2< + w X 2i ) mod 2 2w . 

CDMS 1111 : Feistel-like domain extender for TBC E. 

CDMS[i?]jf (T, L || R) = E h : (10 ||| T || R', L') || R' 
where R' = Ek{ 01 || T || L', R) and L' = Ek( 00 || T || R, L). 


Fig. 5. TCTi and TCT 2 use these constructions as components 
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Abstract. Online ciphers encrypt an arbitrary number of plaintext 
blocks and output ciphertext blocks which only depend on the preceding 
plaintext blocks. All online ciphers proposed so far are essentially serial, 
which significantly limits their performance on parallel architectures such 
as modern general-purpose CPUs or dedicated hardware. We propose the 
first parallelizable online cipher, COPE. It performs two calls to the un- 
derlying block cipher per plaintext block and is fully parallelizable in 
both encryption and decryption. COPE is proven secure against chosen- 
plaintext attacks assuming the underlying block cipher is a strong PRP. 
We then extend COPE to create COPA, the first parallelizable, online 
authenticated cipher with nonce-misuse resistance. COPA only requires 
two extra block cipher calls to provide integrity. The privacy and integrity 
of the scheme is proven secure assuming the underlying block cipher is a 
strong PRP. Our implementation with Intel AES-NI on a Sandy Bridge 
CPU architecture shows that both COPE and COPA are about 5 times 
faster than their closest competition: TCI, TC3, and McOE-G. This high 
factor of advantage emphasizes the paramount role of parallelizability on 
up-to-date computing platforms. 

Keywords: Block cipher, tweakable cipher, online cipher, authenticated 
encryption, nonce-misuse resistance, parallelizability, AES. 


1 Introduction 

Online Ciphers. A cipher which takes input of arbitrary length is said to be 
an online cipher if it can output ciphertext blocks as it is receiving the plaintext 
blocks. Specifically, the ith ciphertext block should only depend on the key and 
the first i plaintext blocks. This desirable functionality known more generally as 
online data processing is characteristic for other cryptographic primitives such 
as standard encryption schemes like CTR, CBC, OFB, and CFB. 

The first theoretical treatment of online ciphers was put forward by Bellare, 
Boldyreva, Knudsen, and Namprempre [4]. They introduce the online ciphers 
HCBC1 and HCBC2, both of which require the use of two keys, one for the un- 
derlying block cipher and the other for the almost-xor- universal hash family HU . 
Subsequently, Nandi HU proposed two more efficient online ciphers MHCBC and 

K. Sako and P. Sarkar (Eds.) ASIACRYPT 2013 Part I, LNCS 8269, pp. 424-g33] 2013. 
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MCBC. MHCBC improves upon HCBC2 by using a smaller hashing key with 
a finite field multiplication as universal hash function, whereas MCBC does not 
even require a universal hash function, thus needing only one block cipher key 
and calling the block cipher twice per plaintext block. Rogaway and Zhang in [T7] 
recast the formalism of Bellare et al. [3] in terms of tweakable block ciphers [T7] 
and provide three generalizations of the previous online ciphers: TCI, TC2, and 
TC3. 

Authenticated Encryption from Online Ciphers. While online authenti- 
cated encryption (AE) schemes are not a noveltyQ presently we are aware of only 
one family of online and misuse-resistant AE schemes, McOE m- McOE makes 
use of the online cipher TC3 m to build its general structure and adds two calls 
to the tweakable cipher to achieve authenticity. To process messages of arbitrary 
lengths, McOE applies a tag splitting method, similar to the ciphertext stealing 
method [5]- 

Bellare et al. [1] give a few generic transformations to turn an online cipher 
into a secure authenticated encryption scheme. 

Problem Statement. All existing online ciphers are highly sequential and 
none of them offer any possibility for parallelizing the computation between 
distinct block cipher calls. The only exception can be seen in TCI, which allows 
parallelization only in decryption but not in encryption. As a consequence, the 
McOE AE schemes are not parallelizable either, due to the fact that they are 
based on existing online ciphers. 

At the same time, in the overwhelming majority of cases in practice, the 
underlying cipher is AES which is very well parallelizable on many platforms. 
Parallelization is a rather inherent feature of hardware implementations, both 
in ASIC and FPGA. Also in general-purpose software, parallelizable encryption 
algorithms have profited in terms of performance due to the bitslice approach for 
a long time already [H1II31IIB] ■ However, with the introduction of the hardware 
supported AES by Intel in general-purpose x86 CPUs as an instruction set AES- 
NI in Intel Westmere, Sandy Bridge, and Ivy Bridge — followed by the AMD 
adoption of AES-NI in AMD Bulldozer and Piledriver — the parallelizability 
of the AES modes of operation has become of truly paramount importance. 
With AES-NI, using a parallelizable mode of operation enables performance 
advantages of a factor 3 and higher — see, for instance, the case of the (serial) 
CBC encryption vs (parallel) CBC decryption [T] . 

Our Contributions. We present the first parallelizable online cipher, COPE, 
and the first parallelizable online authenticated encryption scheme with nonce- 
misuse resistance, COPA. 

COPE: Our novel design is illustrated in Fig. [TJ To process a single plaintext 
block two block cipher calls are required. A secret mask (tweak) is applied 

1 Examples of online AE schemes include EAX [5], GCM [19], and OCB1-3 [16ll25ll26 |. 
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to the plaintext block and used as input to the first block cipher call. Then 
the output of the second block cipher call is masked again to produce the 
ciphertext block. 

By introducing dummy masks, each block cipher call can be viewed as an 
instance of the XEX construction [25], which uses the so-called “doubling” 
mask generation. Our basic design only deals with message lengths that are a 
multiple of the block length. In order to handle messages of arbitrary lengths 
we use the technique prescribed in the XLS domain extender by Ristenpart 
et al. in [53]. In contrast with previous designs, our scheme only uses a single 
key and a single cryptographic primitive , namely a block cipher. 

COPE is proven IND-CPA up to the birthday bound of n/2-bit security, 
where n denotes the block size of the underlying block cipher. 

COPA: We transform COPE to support authentication, while maintaining 
parallelizability. The modifications are limited to computing an XOR sum of 
the plaintext data and using two extra block cipher calls; these can be seen 
in Fig. [21 The scheme also supports associated data in a way similar to how 
PMAC1 [25] operates. The privacy and integrity of COPA are proven up to 
the birthday bound. 

To illustrate the impact of the parallelizability of our online schemes, we imple- 
ment them with AES-NI on an Intel Sandy Brigde processor. We systematically 
compare the performance we attain with the online ciphers TCI, TC3, and 
MCBC as well as the online AE scheme McOE-G when instantiated with the 
AES. When compared to these closest online competitors, which are all explic- 
itly not parallelizable, our modes provide performance improvements between a 
factor of 4.5 and 5, being below 2 cycles per byte on a single core. We expect 
almost a linear speed-up when several cores are available. 

Organization of the Paper. The remainder of the paper is organized as 
follows. We recall the necessary background on block ciphers in Section 2. Section 
3 provides the specification of our new parallel modes. Sections 4 and 5 deal with 
the security proofs. Section 6 gives AES-NI implementations of our modes along 
with a systematic comparison to the state-of-the-art schemes. We conclude in 
Section 7. 

2 Preliminaries 

2.1 Block Ciphers 

A block cipher E : K. x (0, l} n — > {0, 1}" is a function that takes as input 
a key k £ 1C and a plaintext M e {0, 1}", and produces a ciphertext C = 
E{k , M). We sometimes write Ei-(-) = E(k. •). For a fixed key k, a block cipher 
is a permutation on n bits, and we denote the inverse permutation (decryption 
function) by E^ 1 . 

Let Perm(n) be the set of all permutations on n bits. When writing x ■£- X 
for some finite set X we mean that x is sampled uniformly from X. We write 
Pr[A | B] to denote the probability of event A given B. 
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Definition 1 . Let E be a block cipher. The prp±l advantage of a distinguisher 
V is defined as 

Adv prp±1 (X>) = |pr[D Bfe -^ _1 = 1] - PrfD 7 ^" 1 = l] | . 

Here, V is a distinguisher with oracle access to either (£?*,, E^T 1 ) or (n, 7r _1 ). 
The probabilities are taken over k £- )C, n ■£- Perm(n) and random coins of V, 
if any. By Advg P±1 (t, q) we denote the maximum advantage taken over all 
distinguishers that run in time t and make q queries. 

We shall also write E j^ 1 for (E^jE^ 1 ). Similarly, 7 r ±:L means (n, 7r _1 ), and so 
on. 


2.2 Binary Fields 

The set {0, 1}" of bit strings can be considered as the finite field GF(2 n ) consist- 
ing of 2" elements. To do this, we represent an element of GF(2") as a polynomial 
over the field GF(2) of degree less than n. A string a n _ia n _2 • • • aiao £ {0, 1}” 

corresponds to the polynomial a„_ix " _1 + a„_ 2 x " -2 H b oqx + ao € GF(2"). 

The addition in the field is just the addition of polynomials over GF(2) (that 
is, bitwise XOR, denoted by ®). To define multiplication in the field, we fix 
an irreducible polynomial /(x) of degree n over the field GF(2). Given two 
elements a(x), 6 (x) G GF(2"), their product is defined as a(x) 6 (x) mod /(x)- - 
polynomial multiplication over the field GF(2) reduced modulo /(x). We simply 
write a(x) 6 (x) and a(x) • b(x) to mean the product in the field GF(2"). 

The set {0, 1}" can be also regarded as a set of integers ranging from 0 
through 2" — 1. A string a n _ia „_2 • • • cpao G {0,1}" corresponds to the in- 
teger a n _i2 " _1 +a n _ 22" -2 H bai2 + ao G [0, 2" — 1], We often write elements 

of GF(2") as integers, based on these conversions. So, for example, “2” means x, 
“3” means x + 1, and “7” means x 2 + x + 1. When we write multiplications such 
as 2 • 3 and 7 2 , we mean those in the field GF(2"). 

2.3 XE and XEX Constructions of Tweakable Ciphers 

Given a block cipher E : K. x {0, 1}" -G {0, 1}" and a secret mask A G {0, 1}", 
the ciphers 

E 'k,A(x) = f E k(x ® A) or E' kA {x) d = E k {x® A)® A 

behave like another block cipher independent of E k (up to some bound). In 
the case of E' k A , adversaries are allowed to make only forward queries, whereas 
E' k A accepts both encryption and decryption queries. Now consider a set of 
secret masks 7 -, where A- L and Aj may not be necessarily independent. 

An index i £ T is called a tweak, which is not secret. We obtain a tweakable 
cipher E : K. x T x {0, 1}" — (0, 1}" by defining E k j = f E' k Ai , and similarly 
Ek,i- We consider E k[i and E k j together, where i £ To, j £ Ti and 7o n7i = 0 . 
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Definition 2. Let E, E be tweakable ciphers. The twk advantage of a distin- 
guisher V is defined as 

AdvgJ(X>) = |Pr [©&,«.»£} 3 j]- p r = l] | . 

Here, V is a distinguisher with oracle access to a series of permutations. The 
tweaks run over i e To and j E 7i where To n 7i = 0 . By Adv~f~(t, q) we 
denote the maximum advantage taken over all distinguishers that run in time t 
and make q queries in total. 

The doubling method [25] enables us to produce many different values of the 
mask A from just one secret value L = f Ek( 0). Namely, the masks are pro- 
duced as A = 2“3 , 9 7 7 L for varying indices of a, [3 and 7 . To do this, we need to 
choose our irreducible polynomial /(x) carefully. First, /(x) needs to be primi- 
tive, meaning that 2 generates the whole multiplicative group. Second, we make 
sure that log 2 3 and log 2 7 are both “huge.” Third, we check if log 2 3 and log 2 7 
are “apart enough” (modulo 2" — 1). We impose these conditions to ensure that 
values 2 a 3^7 7 do not collide or become equal to 1. For example, when n = 128, 
the irreducible polynomial /(x) = x 128 + x 7 + x 2 + x + 1 satisfies these require- 
ments, making values 2 a 3 ,9 7 7 all distinct and not equal to 1 for a E [— 2 108 , 2 108 ] 
and /3, 7 E [— 2 7 , 2 7 ] [35] , except for (a, /3, 7 ) = (0,0,0). So we obtain tweakable 
ciphers Ek, a p- y and Ek, a py. 

Lemma 1 (XE and XEX [25]). LetTo,Ti = {(a, /3, 7 )} be two sets of integer 
triples such that 2 a 3^7 7 are all distinct and not equal to 1 , in particular ToHTi = 
0. Then the permutations {Ek^p-y}^ U {E fc Q/37 } 7 . are indistinguishable from 
independently random permutations {^ 0 / 37 }^ u{ 7 r^ 7 } ri . Specifically, for given 
t, q, there exists at' « t such that 

Adv|| (t,q) < + Adv^V^). 


3 COPE and COPA: Design and Specification 

We define COPE and COPA. COPE is an online cipher secure against chosen 
plaintext attacks. COPE makes two calls to the underlying block cipher per 
message block. COPA is an authenticated online cipher that builds on COPE. 
The additional cost of producing a tag is kept minimal — a message checksum 
and two extra block cipher calls. COPA accepts associated data input. 

In this section we assume that the message length is a positive multiple of n. 
The length of associated data can be fractional. In Add. (XI we show how to handle 
fractional messages with COPE and COPA. At the end of this section we give 
the design rationale for our constructions, explaining our choice of operations. 
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Fig. 1. Online cipher COPE. Set V '= 0 for COPE. Variable S will be used later by 
COPA. 

3.1 COPE Definition 

Let E : K. X {0, 1}" -¥ {0, 1}" be an n-bit block cipher, and denote L = f Efc(0). 
The encryption and decryption procedures of the COPE online cipher on a 
message M[1]M[2) ■ ■ ■ M[d\ of d n-bit blocks and on a ciphertext C'[1]C'[2] • • • C[d\ 
are then defined as: 

COPE-Encrypt £[E\: COPE-Decrypt £ -1 [£]: 

V[0J L,A 0 <- 1L,A 1 f- 2 L V[0J 4- L,Ao <- 3 L, Z\i 2 L 

for i = 1 , . . . , d do for i = 1 , . . . , d do 

V\z\ v- E k (M\i] © A 0 ) © V[i - 1] V[i\ <- E^(C\i\ 0 A x ) 

C[i } E k (y [*]) 0 M[i] <- E^ 1 (V[i] © V[i - 1]) 0 ZV 0 

A 0 ^2A 0 ,A 1 ^2A 1 A 0 ^2A 0 ,A 1 ^2A 1 

end for end for 

The encryption operation is illustrated in Fig. [TJ 

3.2 COPA Definition 

The core of the authenticated online cipher COPA is identical to COPE. The 
only differences are that first, an authentication tag T is generated after the 
COPE cipher invocation, and second, that associated data (if any) is processed 
before the cipher iteration to produce a value V that is XOR-ed into the first 
intermediate block chaining (see Fig. [T]): V [0] <— V (B L. If there is no associated 
data, then we set V d = 0. 

The tag T is computed by keeping a XOR checksum of the message blocks 
E = f M[l] 0 ■ ■ • 0 M[d\ and computing 

T E k (E k (E®2 d - 1 3 2 L)QS)®2 d ~ 1 7L, 
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(a) Tag generation 


(b) Processing of associated data 


Fig. 2. Authenticated online cipher COPA: tag generation and processing of associated 
data 


with S = f V[d\ denoting the last intermediate value in COPE’s block chaining, as 
in Fig- [TJ The tag computation is illustrated in Fig. [2a] The value V is generated 
as follows: any associated data A[l], . . . , A\a] is padded (if not a multiple of n 
bits) by a one and as many zeroes as necessary to obtain a multiple of the 
block size n. These blocks are then processed by a PMACl-like 122] iteration as 
illustrated in Fig. (2b[ Here, the block "A[a]10*” replaces the block "A[a] r ' if A[a] 
itself is not n bits. Tag verification occurs by checking if 

S®E k (S®2 d ~ 1 3 2 L) = E^(T ® 2 d-1 7L), 
where the tag is rejected if the equality is not true. 


3.3 COPE and COPA for Arbitrary-Length Messages 

We explain how to extend our schemes to accept “fractional” messages M in 
App. [ffj Here the length \M\ is not necessarily a positive multiple of the block 
size n. Note that simply using 10* padding to M would result in ciphertext 
expansion. The methods described in Add. (Al avoid such expansion with minimal 
loss of efficiency. 


3.4 Design Rationale 

One could combine universal hashing with a block cipher to design an AE scheme. 
Indeed, McOE-G [IT] follows this approach. However, we decided to avoid the use 
of universal hashing, for three reasons. First, the use of universal hashing would 
result in additional implementation cost, in particular with hardware. Second, 
recent study shows that there is an issue of weak keys with polynomial-based 
hashing [22]. Third, on the latest Intel CPUs, one call of AES is faster than 
one multiplication over the finite field GF(2 128 ), which is an operation used for 
polynomial-based hashing. 
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There has been discussion of whether one should use the doubling method 
or Gray code to produce tweak masks. We decided to use doubling, for three 
reasons. First, doubling provides us with the framework of XE and XEX con- 
structions, which makes our constructions and proofs simple and easy to ana- 
lyze. Neither our constructions nor our proofs can be directly translated into a 
Gray code version, as it is not immediately clear which masks we should use for 
the construction to make the proof work. Second, although it was reported that 
Gray code performs better than doubling on Intel CPUs [16] , recent study shows 
that the doubling method can be implemented equally efficiently [3] . Third, the 
speedup of Gray code mask generation requires precomputation and memory, 
whereas doubling does not. 

4 Privacy of COPE 

4.1 Security Definition of Online Ciphers 

We use the security definition of online ciphers from Rogaway and Zhang 1221- 
Let ({0, 1}") + denote the set of strings whose length is a positive multiple of n 
bits and is at most 2" • n bits. An online cipher £ : K, X ({0, 1}") + —> ({0, 1}") + 
is a function such that it is a permutation on every block of n bits, having 
the additional feature that the outputs are the same for a common prefix. In 
other words, the first |M| bits of £k(M\\N) and £k(M\\N') are the same for any 
M, N, N' £ ({0, 1}") + . So an online cipher £\ ; yields a permutation of i-th blocks, 
where the permutation is determined by the prefix (i.e. the first i — 1 blocks). 
Let OPerm(n) be the set of all such permutations n : ({0, 1}”) + -A ({0, 1}") + . 
Definition 3. Let £ be an online cipher. The IND-CPA advantage of a distin- 
guisherD is defined as 



Here, D is a distinguisher with oracle access to either £k or n. The probabilities 
are taken over k <— 1C, n <— OPerm(n) and random coins of V, if any. By 
Advg pa (t, q, a, i) we denote the maximum advantage taken over all distinguishers 
that run in time t and make q queries, each of length at most l blocks, and of 
total length at most a blocks. 

4.2 IND-CPA Proof Sketch 

This section gives a sketch of the proof showing that COPE is secure against 
chosen-plaintext attacks with respect to privacy (IND-CPA). The details of the 
proof can be found in the full paper [5] . 

Theorem 1. Let £[E\ denote COPE, where E is the underlying block cipher. 
We have 


Adv^ (*> T r, Q < + Adv prp±1 (t', 4 a) + ^ + ^ 


(l + l)(g~l) 2 


where t' w t. 
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Fig. 3. IND-CPA proofs of COPE: introducing dummy masks rewriting the scheme in 
terms of XEX 


The proof consists of two steps. First, we rewrite COPE in terms of XEX 
constructions 0 We introduce dummy masks to the state ^values, as shown in 
Fig. [31 The block cipher calls in the upper layer are now -Efc, a -i,i,o and those 
in the lower layer Ek,a, o,o- Note that the “L” initially XORed to the state now 
disappears. We use Lem. [T] to replace the block cipher calls in the upper layer 
with random permutations and those in the lower layer with 7T a) o,o (for 

a = 1, 2, . . .). Such a replacement costs us 

9 ' 5 2 ( „ 2<,)i + Advir p± y , 2 ■ 2a) = + Adv!' p± '(i ',4a). 

We write £[it] to denote the COPE scheme making calls to independent random 
permutations 7r a . J g 7 rather than to a block cipher. 

Second, we show that £[ 77 ] behaves exactly the same as the ideal functionality, 
as long as collisions of state values do not occur. Define variables V[a\ of state 
values as V[a\ = 0“^ which is also equal to ir~ q (^C^ck]). 

We look for collisions of these variables. Here by a “collision” roughly we mean 
the same value of V[oi\ coming from different prefixes M[1]M[2] • • ■ M [a] and 
M'[l] Af'[2] • • ■ M'[a], for some a. More precisely, we have a collision of V [a] = 
V'[a] if we have V\a — 1] / V'[ot — 1] and V[a ] = V’[a\, which implies we must 
have M[a\ ^ M'[a] and also M[i] ^ M'[i] for some i < a. Let C denote the 
event that a collision of V[a ] occurs for some a. 

Claim. Unless C occurs, £\i r] is indistinguishable from the ideal functionality. 
Furthermore, we have Pr M sets C] < {i + l)/2". 


2 The reason why our IND-CPA COPE is based on XEX constructions, and not 
on XEs, is because our COPA, which gives decryption oracle access to adversaries, 
builds upon COPE. 
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5 Privacy and Integrity of COPA 

5.1 Security Definition of Authenticated Online Ciphers 

Also for authenticated online ciphers, we use the IND-CPA security advantage 
of Def. El except that the ideal encryption oracle now has an additional random 
function that maps {0, 1}* X ({0, 1}") + to {0, 1}", corresponding to ( A , M) i— > T. 

We use the notion of integrity of authenticated encryption schemes from Fleis- 
chmann et al. m By .JL, we denote a function that returns A on every input. 

Definition 4. Let £ be an online cipher. The integrity advantage of a distin- 
guisherD is defined as 



Here, D is a distinguisher with oracle access to either or (£*,,_!_). To 

avoid a trivial win, we assume that the distinguisher does not make a query 
(A, C, T) if it has made a query ( A , M) to the encryption oracle and obtained 
( C,T ) from the oracle. By Adv™ t (t,gr, a, t) we denote the maximum advantage 
taken over all distinguishers that run in time t and make q queries, each of length 
at most i blocks, and of total length at most a blocks. 

5.2 Privacy of COPA 

We now give a proof sketch of the IND-CPA security of COPA. The details can 
be found in the full paper [2]. 

Theorem 2. Let £[E\ denote COPA, where E is the underlying block cipher. 
We have 


Adv £[£]M,M) < 39 ^ 2 „ ^ + Ad V P rp±1 (t',4(g- + q)) + ^ + 


(l + 2)(g— l) 2 


where t' w t. 

The IND-CPA security analysis of COPE carries over, with only minor modifi- 
cations. First, we introduce dummy masks in a similar way (to the encryption 
part), and replace all XE (in the associated-data part) and XEX constructions 
by random permutations. This replacement costs us 


9.5 • (2cr + 2q) 2 


■+Ad< p±1 C£',2.2(a+g)) = ^^+Adv prp±1 (t' , 4(u+g)). 


Write £[i r, 7 r] to denote the COPA scheme calling random permutations instead 
of a block cipher. 

Next, we again use the collision event C, but introduce two more events. One 
is A, which is the event that we have a collision of V for two different associated 
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data. Recall that for A = 0, we have V = 0. The other is T, which is the event 
that we have a collision of tag values for messages of the same length (or more 
precisely, a collision of input values to a random permutation that produces 
tags). 

Claim. Unless A V C V T occurs, £[ir, 7t] is indistinguishable from the ideal func- 
tionality. 

Lemma 2 (PMAC1, [25]L The function H[n] : (0, 1}* -> {0, 1}” ( A i-> V ) is 
indistinguishable from a random function : (0, 1}* — >• (0, 1}". Specifically, the 
distinguishing advantage ( defined accordingly, only forward queries ) is at most 
<j 2 /2". Here, (0, 1}* denotes the set of strings whose length is at most 2 n -n bits. 

So now we replace XE and XEX constructions with random permutations and 
H[n] with a random function <I>. Denote the scheme by £[<P. 7r], Then we have 
the following. 


Claim. We have P t[D £ ^ sets A] < q 2 /2 n and sets C V T | -.A] < 

+ 2)(<7 — l ) 2 / 2 n . 


5.3 Integrity of COPA 

The integrity proof of COPA is more involved than the privacy proofs and we 
include the full proof in this paper. We prove the following theorem: 

Theorem 3. Let £[E\ denote COPA, where E is the underlying block cipher. 
We have 


Adv f n [E] (*, 9, 0-, -0 < 39 ^ + AdvP rp±1 {t\ 4 (o- + ?)) + 


(l + 2)(g-l) 2 | 2 q 


2 n 2 n ’ 


where t' w t. 

Let F denote the event that the decryption oracle l returns something other 
than _L. Clearly the two games are the same as long as the event F does not 
occur, so we have 


Pr [D 5 ? 1 = 1] - Pr[V Sk ^ = l] < Pr [D e ^ sets F] . 


In the rest of this section we shall bound this probability. First, as usual, we 
replace block cipher calls with random permutations 7r, 7t. Then we replace the 
PMAC1 part of processing associated data with a random function $. These all 
together cost us (cf. the proof of Thm. [2]) 
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Removing “Privacy Part”. Define events A, C and T as we have done in 
the privacy proof of Thm. [21 Note that these events are defined in terms of 
variables V[a], where we also define P[0] = f V and V'[a + 1] the input value to 
the block cipher that produces a tag. We define these values as being set only by 
the queries to the encryption oracle £. We do not let queries to the decryption 
oracle £~ x affect variables V[-], V' [-] , whether or not it returns a message or _L. 
Set E d = A V C V T. 

Next we define similar events A', C 7 and T 7 . These are exactly the same as 
the previous ones, except that now we consider only those events (i.e. collisions 
of V[-] or V'[-}) that occur prior to a forgery (that is, under the condition ->F). 
Again, set E' = f A' V C f V T'. 

When we consider event F, we would like to assume that we are under the 
condition -iE 7 , meaning that the encryption oracle E has behaved ideally so far 
(till forgery). To do this, we use the inequality 

Pr[X> £±1 [ # ’ 7r ] sets F] < Pt[D £± sets F | -.E'] +Pr [D £±1 ^ sets E 7 ]. 

We shall construct a distinguisher D' that breaks the privacy of the encryption 
oracle E. The distinguisher V uses D. and the query complexity of V is at 
most that of V. Specifically, V starts running V , answering ^-queries using its 
£ oracle, and whenever V makes a query to the decryption oracle £~ [ , T>' replies 
with a _L. 

Claim. We have Pr[X> £±1 E #,7r ] sets E'] < q 2 / 2” + (£ + 2 )(q — l) 2 /2”. 

Proof. Note that if T> £±1 sets E 7 , then till this event V simulates the environment 
of V correctly. Hence we get Pr [V £±1 sets E 7 ] < Pr [V £ sets E] , which is less 
than q 2 / 2" + [l + 2)(q — l) 2 /2" as shown in the privacy proof. □ 

Passing to a Single-Query Adversary. So it remains to evaluate the prob- 
ability that T> sets F under the condition -iE 7 . We shall construct a forger T>\ 
from D. The forger V i makes multiple queries to the encryption oracle £ but 
makes only one query to the decryption oracle £~ x at the end of its run. We 
define D i as follows: it chooses a random index i G [1,?]. It then runs T>, an- 
swering its ^-queries using the £ oracle of V i and answering the queries to the 
decryption oracle £ -1 with _L. When V makes the <-th query (A*,C*,T*) to 
the decryption oracle, T>i outputs the query (A*,C*,T*) and stops (or more 
precisely, makes that query to the decryption oracle £~ x and stops.) 

Claim. We have Pr[V £±1 ^ sets F | -iE 7 ] < qPr[V £±1[<T>M sets F | — <E 7 ] . 

Proof. Let Fh denote the event that at the h- th query the decryption oracle £ -1 
returns something other than T for the first time; that is, the oracle has returned 
only T so far. Clearly these are disjoint events, and we have F = Vfe=i Ffc- Then, 
under the events -iE 7 and i = h, the forger correctly simulates the game of V. 
Therefore, we get Pr^f* 1 sets F | — >E 7 ] > Pr[(i = h) A V £±1 sets F | — >E 7 ] > 
(l/g)Pr[R £±1 sets F | — -E 7 ] . □ 
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Evaluating Forgery Probabilities. Let {A* ,C* ,T*) denote the (non-trivial) 
query made by V i to the decryption oracle £ _1 [!>, 7r], We shall evaluate the 
probability that this would make £ _1 return something other than _L. To evaluate 
the probability, we shall consider the following cases. 

Lemma 3 (Case 1). If A* or T* is new, or C* contains a new block, then the 
probability of a forgery is at most 2/2". 

Proof. If A* is new, then it means that it triggers the random function ( P and 
yields a fresh random value V <— <P{A*). This value is XORed to the value that 
is input to the block cipher to produce the tag, which must be equal to T*. All 
other values XORed to the value are independent of V. Hence, regardless of the 
values C*,T *, the probability of such an event is at most 1/2". 

Say that A* is not new, but that C* contains a new block. Let C*[a\ be one 
of the new blocks. The decryption invokes o o(^*[ a ])> which is sampled from 
the set of at least 2" — q points. Therefore, the probability of a forgery is at most 
1/(2" — q) < 2/2", assuming q < 2" _1 . 

Say that A* is not new, C* does not contains a new block, but T* is new. This 
is similar to the previous case. This would trigger a fresh point of 7r^* 1 _ 1 0 1 (T*), 
where d* denotes the number of blocks in the message M*. The point is sampled 
from the set of at least 2" — q points. Therefore, the probability of a forgery is 
at most 1/(2" — q) < 2/2". □ 

Lemma 4 (Case 2). If A* and T* are old, and C* consists of old blocks, then 
the probability of a forgery is at most 2/2". 

Proof. To handle this case, we introduce some notation. For the query ( A *, C * , T*) 
in question, divide C* into blocks as C* [1)C* [!]■■■ C*[d*] 4- C* and define 
C*[0] = A* and C*[d* + 1] = T. We then focus on a pair of adjacent “blocks” 
( C*\a — 1], C*[a]) for a = 1, 2, . . . , d* + 1. We call a pair old if it (as a pair) 
has already appeared in some previous query made to the encryption oracle £ 
and in the corresponding value returned by the oracle. That is, if V has made 
a query ( A',M ') to the oracle and got {C ,T') back, then we check if the pair 
in question ( C*[a — l],C*[a]) is contained in (A',C',T > ) — that is, we check if 
(C* [a — 1] , C* [a]) = (C' [a — 1] , C' [a]) holds, where C' [0] and C' [d' + 1] are defined 
similarly. We do this for all previous queries. We call the pair ( C*[ot — 1], C* [a]) 
new otherwise. 

Note that the query (A* ,C* ,T*) always contains a new pair. If ( A*,C*,T *) 
contains no new pairs, then, given the non-triviality of the query, we must have 
observed a collision, contradicting the assumption -iE'. 

We now make a distinction among new pairs ((7* [a — 1], C*[a]) based on the 
decrypted message block M*[a] from the two adjacent ciphertext blocks. We say 
that a pair is collapsing if there exists a previous query ( A! ,M ' ) made by V to 
the encryption oracle £ such that M’[a] = M*[a\. 

There exists a new pair (C*[a — l],C*[a]) that is not collapsing. This case 
means that we trigger a random sampling of 7r~\ 0 to compute M*[a\. Then, 
note that the value E* = M*[ 1] © M*[ 2] © • • • © M*[d *] is already uniquely 
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determined by the values C*[d*\ and T* (via Fig. (2al) . There are at least 2" — q 
possible values for M* [a] , and the message blocks must sum up to this particular 
value £*, which happens with a probability at most 1 /( 2 " — q) < 2 / 2 ". 

All new pairs in ( A*,C*,T *) are collapsing. This final case is quite different 
from the previous ones above, as we do not have any fresh sampling of permu- 
tations 7 or the random function <P in evaluating 1 [<P. 77 ] (A*, C*, T*). To 
tackle this case, we shall convert this forgery game into one where the adver- 
sary T>° tries to find multiple collisions by outputting the following set of values, 
without making any query to the oracles: 

1 . re [ 1,4 

2. 1 < cci < a?, < ■ ■ ■ < a r < t + 1, 

3. ( A 2 ,M 2 ), ..., ( A r ,M r ), and 

4 . (. a[,m { ), [a! 2 ,m' 2 ), ..., (a;,m;). 

The adversary T>° “wins” if the submitted values form a multi-collision in the 
following sense: (Aj, Mj) and (A(, A/I) collides at the ccj-th block, for all i e [ 1 , r\. 
The adversary T>° runs 2?i , simulating the £ oracle with an ideal functionality. 
Note that this simulation is correct under the condition -iE'. When D 1 outputs 
( A*,C*,T *), V° first checks for new pairs contained in it. Let 1 < aq < «2 < 
■■■< a r < £ + 1 be the positions of new pairs. Then T>° checks the history of 
values ( C,T ) that it returned. Note that under -iE', a block C*[ot\ determines a 
unique prefix AM. We choose ( Ai,Mi ) to be the prefix determined by C*[oti]. 
To choose (A', M[), let A[M" be the prefix determined by C* [a, : — 1], Then D° 
chooses randomly, from the previously queried values, a message block M[a ] 7 ^ 
Mi[a\. Set M[ = f M"M\oi\. The adversary D° does this for i = 1,2,... except 
for the last block. 


— If a r < d* + 1, then we know the message checksum S* = M*[ 1] ® ■ ■ • ® 
M*[d*], so T>° does not have to guess the value of M’ ar [a r \. 

- If a r = d* + 1, then we simply set the last input value to be the checksum 
of all previous (guessed) message blocks. 

Now we observe that as long as all the guesses of the message blocks are correct, 
T>° wins if D 1 succeeds in forgery of this type. It should be noted that the values 
returned by £ are independent of the success probabilities in question, under the 
event ^E'. Therefore, for a fixed r, 

p r [D° wins | r] > ^ ^ Pr[©i forges | rf - ^ges | r] . 

We then calculate Pr [T>° wins | r] . We do this by lazy sampling of the permu- 
tations, and we see that, for a fixed r, 


Pr[D° wins \r]< - 


1 1 
2 "- 1 ~ ( 2 " - 1 ) r 


Hence by varying r we get in total 


Pr[2?i forges] < 


(g-i r 

i (2 n - 1) ! 


- Pr[j = r]< 


( 2 " - 1 ) 


E p u< - u = 


(2 n — 1) 
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6 Efficient Parallel Implementation 

6.1 The Setting 

We compare our schemes to some prominent existing online ciphers: TCI and 
TC3 [27] being the most efficient previous schemes; and MCBC [2T] as a repre- 
sentative for a scheme relying only on block cipher invocations (as opposed to 
tweakable block ciphers or universal hashing). The modes HCBC1 and MHCBC 
are implicitly covered by TCI and TC3, and HCBC2 has a performance inferior 
to TC3. 

For the case of authenticated online ciphers, we exclude modes of operation 
and dedicated designs that are based on a nonce and rely on its non-reuse (e.g., 
GCM [15], OCB )16| . ALE [7], and AEGIS [28] ). Therefore, we compare our 
COPA design to the McOE family of authenticated encryption algorithms pT] • 
which, to the best of our knowledge, is the only other online scheme not relying 
on the non-reuse of a nonce. We focus on the McOE-G instance, since McOE-X 
itself is not secure m, featuring a key recovery with birthday complexity. 

For the concrete instantiation of all schemes, we use the AES-128 [TU] as 
the underlying block cipher, and multiplication in GF(2 128 ) as an almost XOR- 
universal hash function US- As target platform for the implementations, we 
chose the recent generation of Intel microprocessors (Westmere or later) which 
support the AES-NI instruction set [12] and carryless multiplication JT3] . 


6.2 Implementation Characteristics of COPE and COPA 

The online modes proposed in this paper can utilize parallelized execution of 
block cipher calls in two ways: for messages longer than one block, the encryp- 
tions of subsequent message blocks can be carried out independently of each 
other once the respective masks have been XORed. The same holds for the sec- 
ond series of block cipher calls, once the chaining XORs have been executed. 

This parallelism can be exploited in a single-core scenario by pipelining the 
block cipher rounds for several consecutive block cipher invocations. Similarly, 
these invocations can be processed independently by multiple threads, with the 
recombination being the computation of the chaining. Note that both scenar- 
ios can be combined when multiple cores with pipelined block cipher calls are 
available, which is typically the case for Intel’s AES-NI architecture. 

On the recent Sandy and Ivy Bridge platforms, the AES round function can be 
computed at a latency of 8 cycles with a throughput of 1 cycle. Consequently, 
to fully utilize the pipeline, our implementation issues 8 AES round function 
evaluations on the next 8 consecutive blocks (independent data and same key). 
The tweak masks are computed using dedicated multiplication routines for 2“, 
3 s and 7 7 6 GF(2 128 ). By contrast, the general GF(2 128 ) multiplication needed 
for TCI, TC3, and McOE-G is implemented using the PCLMULQDQ carryless mul- 
tiplication instruction followed by modular reduction. 
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Table 1. Software performance of (authenticated) online ciphers based on the AES on 
the Intel Sandy Bridge platform (AES-NI). All numbers are given in cycles per byte 
(cpb). 


Algorithm 


message length (bytes) 


128 

256 

512 

1024 

2048 

4096 

8192 

CTR 

1.74 

1.27 

1.05 

0.93 

0.86 

0.83 

0.82 

TCI 

9.00 

8.75 

8.65 

8.60 

8.56 

8.56 

8.56 

TC3 

9.08 

8.82 

8.72 

8.67 

8.63 

8.63 

8.62 

MCBC 

11.66 

11.00 

10.68 

10.52 

10.44 

10.40 

10.38 

COPE 

2.56 

2.08 

1.89 

1.78 

1.72 

1.70 

1.69 

McOE-G 

10.85 

9.73 

9.14 

8.90 

8.74 

8.69 

8.66 

COPA 

3.78 

2.85 

2.31 

2.06 

1.94 

1.88 

1.85 


6.3 Performance Measurements 

We provide performance data for the (authenticated) encryption of messages of 
length 16 • 2 6 bytes, with 3 < b < 10. The performance of AES-CTR is provided 
as a reference point. All measurements were taken on a single core of an Intel 
Core i5-2520M CPU at 2500 MHz, averaged over 5 • 10 5 repetitions, processing 
one message at a time. Our findings are summarized in Table [TJ All numbers are 
given in cycles per byte (cpb). 

One can observe that for all message lengths, the parallelizability of the pro- 
posed schemes results in speed-ups of factor 4.5 — 5 in comparison to the existing 
modes, at least for somewhat longer messages. By fully utilizing the pipeline, 
our schemes are only marginally slower than two times AES-CTR, which implies 
that the overhead imposed by the computation of the masks and the chaining is 
kept at a minimum. The authenticated mode COPA carries the additional over- 
head of two more AES calls plus field arithmetic for finalization, but this quickly 
becomes insignificant as the message length increases. Note, however, that some 
constant overhead in comparison to the unauthenticated mode remains even for 
very long messages: this can be attributed to the fact that the computation of 
the checksum does not allow overwriting the message blocks, leading to increased 
register pressure. We also note that with the availability of carryless multiplica- 
tion, TCI and TC3 can be implemented more efficiently than the purely block 
cipher-based MCBC which was created with the goal to improve performance 
by avoiding field arithmetic. 

The performance of our parallelizable schemes COPE and COPA can be fur- 
ther improved by utilizing multiple cores. Our implementation of multithreaded 
encryption confirms the intuition that one can expect a nearly linear speedup 
when using multiple cores for computing our schemes (i.e., the cost is < 1 cpb 
for two cores and so on). 


440 E. Andreeva et al. 


7 Conclusion 

By presenting COPE, our work provides the first solution for a parallelizable 
online cipher. Building on COPE, we go on to construct COPA, the first paral- 
lelizable and nonce-misuse resistant online authenticated encryption scheme. Our 
implementations of COPE and COPA with Intel AES-NI on a Sandy Bridge pro- 
cessor architecture benefit strongly from the parallelism, which gives us speed- 
ups of about factor 5 in comparison to existing (serial) online ciphers TCI, TC3, 
MCBC and the online AE scheme McOE-G. 

Our designs additionally employ only a single key and use only a block cipher 
as a building block — as opposed to tweakable block ciphers or universal hash 
functions. We prove that our cipher COPE is an IND-CPA secure online per- 
mutation. The privacy result is also carried over to COPA. The integrity proof 
of COPA uses a technique of converting a forgery to a set of multiple collisions. 
It seems that the technique has not been used before by security proofs of par- 
allelizable authenticated encryption mode or message authentication code. The 
technique may be applicable to other new types of parallelizable modes of op- 
eration. We leave it as an interesting open problem to construct a scheme with 
less primitive calls but with comparable security guarantees. 
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A Handling Arbitrary-Length Messages 

A.l COPE for Arbitrary-Length Messages 

Our solution relies on the XLS construction m of VIL tweakable ciphers. XLS 
makes only three block-cipher calls and requires only simple bit operations out- 
side block-cipher calls. 

Let E : /C x Tx { 0, 1}” — t {0, 1}” be a tweakable cipher and E : K.' X {0, 1}” — > 
(0, 1}" a block cipher. Then XLS[E, E] yields a VIL permutation on {0, 1}"+*, 
the set ofstring whose length is between n bits and 2n — 1 bits. Specifically, we 
get XLS[E, E\ : K x K.' x Tx {0, 1}"+* — > {0, 1}”+*. Using appropriate choice of 
(a,/3, 7), we can realize the ciphers used in XLS by the underlying block cipher 
in COPE encryption scheme £, dependent on the message length d. So we write 
XLSk,d to denote the XLS invocation in COPE. 

Let M be a message of at least n bits. Divide it into blocks as M[1]M[2] • • • M[d— 
l]M[d] 4— M, and assume that we have 1 < M [d] < n — 1. Then we can define 
C 4— Tfc(M) as 

C[1]C[2] • • • C[d — 2],S 4— £fc(M[l]M[2] • • • M[d — 2]) (let £*, output S for now) 
C[d - 1 ]C[d\ 4— XLS M ((M[d - 1] ® S)\\M[d}) 

C 4- C[l]C[2] ■ ■ ■ C[d\. 

The IND-CPA proof of COPE carries over with minor modifications. Note that 
we have to “wait” the processing of M[d— 1] till receiving M[d] (or “redo” after 
receiving), making the scheme less online. Yet, we make only three calls to the 
block cipher to process these two blocks. 

We require \M\ >n. As pointed out by [27] . it seems a challenging problem to 
handle the case \M\ < n with encryption-only online ciphers in a secure manner. 


A. 2 COPA for Arbitrary-Length Messages 

There are solutions of arbitrary-length messages for COPA also. This time we 
can take the advantage of the tag to handle even the case |M| < n. The solution 
for the case \M\ > n also becomes more efficient owing to the presence of tags. 
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Tag Splitting for \M\ < n. We can do a trick similar to tag splitting [TT] if 
\M\ <n. We first choose appropriate parameters (a, 13, 7) to make it independent 
of the ordinary COPA encryption algorithm £. Write it £ k (which will be used 
only for fractional one-block messages). Given M such that \M\ = s < n, we can 
define (C,T) <- £ k {M) as 

(C, T') <- £ k (M10*) 

C 4— \C'~\ a (leftmost s bits) 

T^[C'} n - s \T'] s . 

One can directly verify the security of this extension. Note that the integrity 
relies on the 10* padding as well as on the “partial” tag [T'] ,, . 

XLS for | M\ > n. Our solution for this case is similar to that of COPE but is 
more efficient, in that COPA still remains fully online. Again, let M be a message 
whose length is more than n bits. Divide it into blocks as M[1]M[2] • • • M[d — 
l]M[d] <— M , and assume that we have 1 < M [d] < n— 1. Then we can define 
(C,T)^£ k (M) as 


(C, T') £ k (M [1]M[2] ■■■Mid- 1]) 

C[d\T ^ XLS k , d (M[d\T') 

C <- C'C[d\, 

where XLSfc^ is defined similarly to the case of COPE. Given the security of 
COPA and XLS, it is straightforward to verify that this extension is also secure. 
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Abstract. We show how to construct an ideal cipher with n-bit blocks 
and n-bit keys ( i.e . a set of 2 n public n-bit permutations) from a small 
constant number of n-bit random public permutations. The construc- 
tion that we consider is the single-key iterated Even-Mansour cipher, 
which encrypts a plaintext x G {0, l} n under a key k G {0, 1}™ by al- 
ternatively xoring the key k and applying independent random public 
n-bit permutations Pi, ... ,P r (this construction is also named a key- 
alternating cipher). We analyze this construction in the plain indiffer- 
entiability framework of Maurer, Renner, and Holenstein (TCC 2004), 
and show that twelve rounds are sufficient to achieve indifferentiability 
from an ideal cipher. We also show that four rounds are necessary by 
exhibiting attacks for three rounds or less. 

Keywords: block cipher, ideal cipher, iterated Even-Mansour cipher, 
key-alternating cipher, indifferentiability. 

1 Introduction 

Block Ciphers. Block ciphers are one of the most important classes of prim- 
itives in cryptography. They are mainly used to provide confidentiality and au- 
thenticity to communication channels or local data storage means, but also to 
construct hash functions and in other more advanced cryptographic tasks. Syn- 
tactically, a block cipher E with message space {0, 1}” and key space {0, l} m is 
a mapping from {0, l} m X {0, 1}" to {0, 1}" such that for each key k G {0, l} m , 
E{k, ■) is an (efficiently invertible) permutation. Block cipher designs (virtually 
all of which rely on the iteration of some key-dependent round function) can be 
roughly split into two families (with some rare exceptions such as IDEA): 

1) Feistel networks [TTj and their generalizations, where the round function is 
given by (x,y) i— > (y, x 0 F(ki,y)), where x and y are the left and right 
n/ 2 bits of the state, and ki is the round key; prominent examples include 
DES, Blowfish, KASUMI, and Camellia for “classical” Feistel networks, and 
CAST-256 and MARS for generalized Feistel networks; 
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2) substitution-permutation networks (SPNs), where one round generally con- 
sists of the composition of a round-key addition, a non-linear mixing layer, 
and a linear diffusion layer; notable examples include AES, SAFER, CRYP- 
TON, SERPENT, PRESENT, and LED. 

At an even higher design level, SPNs can be described (by collapsing the non- 
linear mixing layer and the linear diffusion layer at i-th round into a single n-bit 
permutation Pj) as successive applications of round-key additions and permuta- 
tions Pj. Such a structure was named a key- alternating cipher by the designers 
of AES [17115] . 

The traditional security notion for a block cipher is pseudorandomness, i.e. 
indistinguishability from a random permutation m- namely, no distinguisher 
with reasonable resources and having black-box access to a permutation (and 
also to its inverse in a more stringent variant of the security notion) should be 
able to distinguish whether it is interacting with the block cipher E(k, •) for a 
randomly chosen key k, or with a truly random permutation. In an asymptotic 
and more theoretical language, a family of block ciphers indexed by a security 
parameter meeting this security notion is called a pseudorandom permutation 
(PRP), or a strong pseudorandom permutation (SPRP) when the distinguisher 
has also access to the inverse permutation. The classical example of a construc- 
tion for which we have some provable security results with respect to indis- 
tinguishability is the Feistel network. Starting from the seminal Luby-Rackoff 
paper m which showed that the Feistel construction with three rounds yields a 
PRP when its round functions are pseudorandom [25], and followed by a paper 
by Patarin g5] showing that four rounds yield a SPRP (which was stated in pf2] 
without proof), a long series of works established refined results in the same 
vein, such as [43I44I59I5U] to name a few. 

The Ideal Cipher Model. Though there are numerous examples where the 
standard pseudorandomness assumption is sufficient to prove (in a reductionist 
sense) the security of a cryptographic scheme ( e.g . for building a symmetric en- 
cryption scheme [3] or a MAC scheme [1]), there are also some settings where 
it might not be strong enough to derive a security proof. Indeed, in some situa- 
tions, the adversary has more abilities than merely querying in a black-box way 
an encryption/decryption oracle. For example, there are some cases where the 
attacker might have access to a more powerful “related-key” oracle [91511] . i.e. 
it can ask encryption and decryption queries for keys that are related (in some 
limited and attack-dependent way) to the main key of the system. 

Ideally, the ultimate security goal for a block cipher would be that it “behaves” 
as a random and independent permutation for each possible key. This naturally 
leads to the so-called ideal cipher model (ICM), the origin of which can be traced 
back to Shannon [55]. In the ICM, a block cipher E with n-bit blocks and rn-bit 
keys is drawn at random from the set of (2 n !) 2 possible block ciphers of this 
form, and made available through oracle queries (for both encryption and de- 
cryption) to all parties (including the adversary). This is very similar in spirit to 
the random oracle model (ROM) [2418] used to model a perfect hash function. 
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To the best of our knowledge, this model was first formally used in a security 
proof by Winternitz [50] and later by Merkle m to show respectively the pre- 
image and collision resistance of the Davies-Meyer compression function. The 
ICM became increasingly popular after Black et al. [T2] used it to extensively 
analyze the security of the PGV block cipher-based compression functions [51] . 
Since then, the ICM has been used to prove the security of a variety of other 
block cipher-based hash functions |3UI31I58I40I46] , of key length extension meth- 
ods for block ciphers [35I21I7I25I26] . of symmetric encryption schemes [33], and 
even of some public-key protocols such as signature schemes [23] , ring signature 
schemes [33], public-key encryption [33], and key exchange protocols [5]. Despite 
these numerous successful applications, one must not lose from sight that the 
ICM only gives heuristic insurance just as the ROM M- In particular, Black m 
exhibited an (arguably artificial) block cipher-based hash function which is prov- 
ably collision resistant in the ICM, but becomes insecure when the ideal cipher 
is instantiated with any concrete block cipher. 

With the ICM at hand, the question now becomes: is it possible to argue 
that a given block cipher design is as close as possible to an ideal cipher? In 
the standard model, one immediately faces the problem that, unlike for pseudo- 
randomness, it even seems hard to come with a satisfactory definition of what 
this formally means, without running into impossibility results (similarly to [T3] 
and [IT]) following from the fact that a concrete block cipher has a short de- 
scription, whereas an ideal cipher does not. This unfortunate state of affairs 
has not prevented cryptanalysts from disproving that a concrete block cipher 
behaves as an ideal cipher by exhibiting some non-random behavior, i.e. some 
non-triviaQ relation between inputs and outputs of the block cipher that can 
be found faster than for an ideal cipher, in a setting where the key is random 
and given to the attacker (known-key attacks), or when the attacker can freely 
choose the key(s) (chosen-key attacks). A classical example is the complementa- 
tion property of DES which, despite being often viewed as a “benign” undesirable 
property, implies that DES does not behave as an ideal cipher. For AES, no such 
non-random properties were known until Biryukov et al. [TQ] showed that so- 
called g-multicollisions can be found faster for AES-256 than for an ideal cipher. 
Known-key and chosen-key attacks were first put forward as an important crypt- 
analysis goal by Knudsen ans Rijmen [55], and have since then become an active 
area of research [48127154] . 

Indifferentiability. Though we cannot hope to formalize (not to say prove) 
that a concrete block cipher behaves as an ideal cipher in any reasonable sense in 
the standard model, it is possible to obtain positive results in idealized models, 
i.e. by viewing some subcomponent of the block cipher as perfectly random. This 
perfect subcomponent is made available to all parties as a public oracle, which 
makes this setting formally distinct from classical indistinguishability. In order 
to assess whether a cryptographic construction based on an ideal subcomponent 

1 We stress that because of the lack of a rigorous definition, the meaning of non-trivial 
here is somehow subjective. 
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is secure, one has to employ the formalism of indifferentiability, introduced by 
Maurer et al. [35] . A construction C (e.g. a block cipher), based on some ideal 
primitive F [e.g. a random permutation), is said to be indifferentiable from 
some target ideal primitive G (e.g. an ideal cipher) if there exists an efficient 
simulator <S (with black-box access to the primitive G) such that the two systems 
(C F , F) and (G, S G ) are indistinguishable. Informally, the goal of the simulator 
is to provide answers which are consistent with what a distinguisher can obtain 
from G, without deviating too much from the distribution of answers of F. An 
indifferentiability result can be interpreted as a way to make sure that the high- 
level design of the construction C has no structural defect. More importantly, a 
composition theorem [IS] asserts that if C F is indifferentiable from G, then any 
cryptosystem proved secure when used with G remains secure when used with 
C F , therefore allowing modular proofs of security in idealized models H 

Soon after its introduction, Coron et al. US] used the indifferentiability frame- 
work to revisit the design of a hash function from an ideal cipher: namely they 
showed that a number of variants of the Merkle-Damgard domain extension 
method [19147] . used with an ideal cipher in Davies-Meyer mode, are indiffer- 
entiable from a random oracle. The converse direction, i.e. proving that it is 
possible to construct an ideal cipher from a random oracle, turned out to be 
harder to achieve. A first attempt to prove that the Feistel construction with 
public random round functions is indifferentiable from a random permutation 
(and hence from an ideal cipher by prepending the key to each input to the 
random round functions) was made by Coron et al. for six rounds [TS], and later 
by Seurin for ten rounds [SS], but serious flaws were found in both proofs [37132] , 
The situation was corrected with a proof by Holenstein et al. [32] that the 14- 
round Feistel construction with public random round functions is indifferentiable 
from a random permutation. This must be contrasted with the classical Luby- 
Rackoff result stating that the 4-round Feistel construction with pseudorandom 
round functions yield a SPRP. 

Our Contribution. The indifferentiability result for the Feistel construction 
mentioned above is fundamentally about how to obtain a random permutation 
from a random (function) oracle. The step to obtain an ideal cipher (i.e. an 
exponential number of independent permutations) is trivially achieved through 
domain separation of the underlying primitive (namely by prepending the key to 
each call to the random function oracles). However, it does not tell us anything 
about how the key should be concretely mixed into the state. In a departure from 
this approach, we ask the following question: given a small number of objects with 
n-bit inputs (e.g. n-bit permutations Pi, . . . , P r ), is there a way to “combine” 
them together with an m-bit key in order to obtain a construction which is 
close to an n-bit block and m-bit key ideal cipher, i.e. a set of 2 m independent 
permutations, without appealing to a trivial domain separation argument? This 

2 Care has to be taken with this composition result when the security definition for 
the cryptosystem puts some limitations on the adversary, such as an upper bound 
on its memory |52I20I 
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naturally prompts us to turn our attention towards the second class of designs, 
namely key-alternating ciphersJl More formally, we consider the construction of 
a block cipher with n-bit blocks and m-bit keys from r public n-bit permutations 
Pi, P r defined as follows: derive (r+ 1) n-bit round keys (ko, . . . , Ay) from a 
master key K through some key derivation function, and encrypt the plaintext 
x G {0, 1}" by computing the ciphertext y defined as: 

y = Ay © P r (Ay_i © P r _i(- ■■Piiki © Pi(k 0 ©«)) • • • )) • 

When r = 1 and two independent n-bit keys (fco, fei) are used, so that the ci- 
phertext is simply y = Ay © Pi(A;o © x), one obtains the so-called Even-Mansour 
cipher [22]. When Pi is modeled as a public random permutation (that the ad- 
versary can query in a black-box way), Even and Mansour [22] showed that the 
resulting block cipher is a SPRP, with security ensured up to 0( 2”/ 2 ) distin- 
guisher queries. The indistinguishability of the general construction for r > 1 
with independent keys (ko, . . . , Ay) was later studied for two rounds by Bog- 
danov et al. na, for three rounds by Steinberger 153, and for any number r of 
rounds (with non-tight security bounds) by Lampe et al. [35]. Unsurprisingly, 
the number of adversarial queries up to which the key-alternating cipher is indis- 
tinguishable from a random permutation increases with the number of rounds. 
Following |38j . and to emphasize that we work in the random permutation model 
for Pi, . . . , P r , we will use the naming r -round iterated Even-Mansour cipher to 
designate the idealized key-alternating cipher where the permutations Pl, . . . , 
P r are public and perfectly random permutations oracles. 

In this paper, we consider the iterated Even-Mansour cipher from the point of 
view of indifferentiability, and ask whether this construction is indifferentiable 
from an ideal cipher for a sufficient number of rounds when the permutations 
Pi , . . . , P r are public and random. A first simple observation is that the con- 
struction with r + 1 independent n-bit keys (ko, . . . , Ay) (resulting in a total 
key space {0, l} m = {0, l}( r+1 ) n ) is never indifferentiable (for any r) from an 
ideal cipher with n-bit blocks and (r + l)n-bit keys (this had already been in- 
formally observed by [12]). In a sense, independent keys offer too much freedom 
to the attacker, enabling to easily find related-key relations. There are two pos- 
sible approaches to solve this problem. The first one is to derive the round keys 
(ko,. ■ ■ , Ay ) from the master key using some cryptographic function (modeled 
as a random oracle for the indifferentiability proof). This was considered in an 
earlier and independent work by Andreeva et al. [2] (see below for a discussion of 
their result). The second possibility (not relying on any cryptographic assump- 
tion about the key derivation function) is to “correlate” the round keys. This is 
the approach we adopt: namely, we consider the iterated Even-Mansour cipher 
where the n-bit round keys (ko, . . . , Ay ) are obtained by applying efficiently in- 
vertible n-bit permutations (70, ■ ■ .7 r ) to the n-bit master key k (see Figured] 
on page 14531) . As will appear clearly in view of its proof, the fact that the master 
key length is equal to the block length is crucial for our result. To insist on this 

3 One could certainly undertake the same study for Feistel-based block ciphers, but 
this seems more complicated. 
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particular point, we call this construction the single-key iterated Even-Mansour 
cipher. Our main result is the following one. 

Theorem. The 12-round, single-key iterated Even-Mansour cipher with twelve 
independent random public n-bit permutations (Pi, . . . , P 12 ) and any efficiently 
invertible (public) n-bit permutations (70, . . . ,712) for the key schedule is indif- 
ferentiable from an ideal cipher with n-bit blocks and n-bit keys. 

In fact, the key derivation permutations 7 , will not play any role in the proof, 
so that we will focus on the simple case where they are all equal to the identity. 
Additionally, we show that at least four rounds are necessary by describing 
attacks (using only a constant number of queries) for three rounds or less (see 
the full version of the paper [5§]h 

Together with the result of [2] discussed below, our main theorem validates 
the design strategy underlying SPNs and more generally key-alternating ciphers 
as a sound way to ensure security beyond pseudorandomness: it (theoretically) 
enables to achieve resistance against related-key, known-key and chosen-key at- 
tacks (that an ideal cipher can withstand). We stress that our result cannot be 
used as is to take concrete design decisions: first, our bounds (as is often the 
case with indifferentiability results) are extremely looseJ3 More importantly, the 
permutations P t used in concrete block ciphers such as AES are often too simple 
to be deemed close to random permutations (not to say independent: they are 
often the same). 

Our Techniques. The techniques used to prove our main theorem are very 
similar to the ones introduced in |lfil55l.'12] for the Feistel construction (while 
the formalism we adopt is very close to [31]). We simply give a very cursory 
overview of the main ideas here (assuming all 7 j’s are the identity). The simu- 
lator works by detecting and completing “partial chains” created by the queries 
of the distinguisher. Define the computation path for a plaintext x and a key 
k as the sequence of pairs (21,2/1), ..., (212,1/12) of corresponding input and 
output values for the simulated permutations Pi, ... , P12. It must hold that the 
value y obtained through this computation path matches the value E[k,x) ob- 
tained from the ideal cipher, otherwise one could straightforwardly distinguish 
the “simulated” world from the “real” world. Hence, simply answering the dis- 
tinguisher queries randomly will not work: the simulator must somehow “adapt” 
the computation path to match the ideal cipher E. Observe now the following 
important property of the single-key iterated Even-Mansour cipher: given only 
two consecutive values t/i and 2,; + i of the computation path (i.e. the output 
value of permutation Pj and the input value to permutation P/+i), it is possi- 
ble to deduce the corresponding key k = ?// © 2 j+i, and hence to move forward 
and backward along the path. Note that this property essentially relies on the 
fact that the master key length is equal to the block length of the permutations 
(would the master key be larger, then it could not be uniquely determined by 

4 Since the proof is already quite involved, we favored simplicity rather than tightness, 

but the bounds can probably be improved at some places. 
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yi and x l+ i). Note also that this is the exact analogue of the property of the 
Feistel network that the input and output values to two consecutive round func- 
tions enable to uniquely move forward and backward inside the construction. 
With this in mind, the strategy of the simulator will be to detect partial chains 
in computation paths created by queries of the distinguisher to two consecutive 
permutations, and “complete” them by moving forward and backward inside the 
iterated Even-Mansour construction (randomly setting undefined permutation 
values encountered along the way, and making a call to the ideal cipher to “wrap 
around”) until the input X( and the output ye for one particular permutation 
Pe are obtained (but still undefined inside Pe history). This permutation is then 
“adapted” by setting Pe(xe ) := ye so that the corresponding input and output 
for the simulated Even-Mansour cipher and for the ideal cipher match. A mo- 
ment of thinking should make clear that the simulator cannot complete each and 
every partial chain created in its history, since this would create a “chain reac- 
tion” leading to an exponential running time and an exponential number of ideal 
cipher queries from the simulator. Hence, one must make a careful and parsimo- 
nious choice of “detection zones” for deciding which partial chains to complete. 
In addition, one must ensure that the simulator never overwrites an entry when 
adapting permutation Pe, thereby rendering a previously completed chain in- 
consistent. How exactly this is done is very similar to the case of the Feistel 
construction )55132j , and we refer to Section 13.11 for a more detailed overview. 

As a retrospective afterthought, we note that the Feistel and the iterated 
Even-Mansour indifferentiability results are not that far apart: they both tell 
how to construct a “big object” (which in both cases has some specific syntactic 
constraints which are relevant only from a cryptographic perspective) taking 2 n 
bits of input (the left and right n-bit halves of the input in the case of the Feistel 
network, and the key and the plaintext in the case of the iterated Even-Mansour 
cipher) from smaller objects with only n bits of input (fourteen n-bit to n-bit 
functions for the Feistel network, and twelve n-bit permutations for the iterated 
Even-Mansour cipher). 

Related Work. In a prior and independent work [2], Andreeva et al. proved 
a result which is close and complementary to ours: they showed that the iter- 
ated Even-Mansour construction with five rounds and a key derivation function 
modeled as a random oracle is indifferentiable from an ideal cipher. Though sig- 
nificantly reducing the number of rounds required for the proof to go through, 
and lifting the restriction that the master key length be equal to the block length 
of the permutations, their technique puts a strong burden on the key derivation 
function, which can hardly be seen as close to a random oracle in most con- 
crete block ciphers. In fact, most key schedules, such as the one of AES, are 
“lightweight” and invertible, which makes our result (where the key derivation 
function has no cryptographic role) more relevant to practice. On the other 
hand, the bounds obtained by [2] are better: the number of queries, the running 
time, and the indistinguishability bound achieved by their simulator are respec- 
tively 0(q 2 ), 0(q 3 ), and 0(q 10 / 2"), while for our simulator they are respectively 
0(q 4 ), 0(q 6 ), and 0(q 12 / 2”). 
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Taken together, the two results indicate, not too surprisingly, that using a 
cryptographically strong key schedule, though not being necessary, enables to 
lower the number of rounds needed to obtain an ideal cipher (however this in- 
terpretation must be taken cautiously: it may well be that, say, the iterated 
Even-Mansour cipher with four rounds is indifferentiable from an ideal cipher, 
independently of the cryptographic strength of the key schedule). 

Regarding the purely theoretical question of the minimal number of n-bit 
permutations needed to construct an n-bit block and n-bit key ideal cipher, it 
was additionally showed in [5] that six independent permutations is sufficient, by 
using a 5-round key-alternating cipher and an independent random permutation 
Pq to build a key derivation function k i-> Po(k) ® k. 

2 Preliminaries 

2.1 Notation and Definitions 

Given a finite non-empty set S, we write s <—$ S to mean that a value is sampled 
uniformly at random from S and assigned to s. The security parameter will be 
denoted n and will be identified with the block length of permutations in the 
Even-Mansour construction. We will write / £ poly(n) to denote a polynomially 
bounded function and / e negl(n) to denote a negligible function. For S £ 
{+,—}, we denote 6 the opposite of 5. 

In the following, we will use calligraphic fonts [A, B, . . .) to denote interactive 
Turing machines, and typewriter fonts to denote Procedures attached to these 
machines. A distinguisher is an oracle Turing Machine D which takes as input a 
security parameter l n , has access to a set of oracles Oi, . . . , O m , and outputs a bit 
b, an experiment we denote x> 0l ’ - ’° m = b. We will always consider distinguishers 
that are deterministic and computationally unbounded, and restricted only with 
respect to the number of oracle queries they make. 

An ideal primitive is a probability distribution on some set of functions, and 
will be denoted with bold fonts. In the corresponding model, a function is drawn 
at random from the corresponding distribution (say F) and all parties (say M.) 
involved in the security experiment are given oracle access to the corresponding 
function, which we simply denote M. F . In the following we will consider the 
following two ideal primitives: 

— a random permutation Pi on {0, 1}", which is a permutation drawn at ran- 
dom from the set of all permutations on {0, 1}", and which can be ac- 
cessed in the two directions Pj(x) and P~ 1 (y); we will use the notation 
P = (Pi, . . . , P r ) to denote a tuple of independent random permutations; 

— an ideal cipher E with message space and key space {0, 1}”, which is drawn 
at random from the set of all block ciphers of this form, and which can be 
accessed in encryption, denoted E(k, x), and decryption, denoted E~ 1 (k, y). 

2.2 Indifferentiability 

We recall the usual definition of indifferentiability. 
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Definition 1. Let q, a, t : N — > N and e : N — > R be four functions of the security 
parameter n. A Turing machine C with oracle access to an ideal primitive F 
is said to be statistically and strongly ( q,<j,t,e)-indifferentiable from an ideal 
primitive G if there exists an interactive Turing machine S with oracle access 
to G such that for any distinguisher V making at most q queries, S makes at 
most a oracle queries, runs in time at most t, and the following holds: 

|Pr [£> G ’ sG = l] - Pr \v cF ' F = ij | < f . 

C F is simply said to be statistically and strongly indifferentiable from G if for any 
q £ poly(n), the above definition is fulfilled with a,t £ poly (n) and e £ negl(n). 

This definition does not refer to the running time of V. When only polynomial- 
time distinguishes are considered, indifferentiability is said to be computational. 
Weak indifferentiability is defined as above, but the order of quantifiers for the 
distinguisher and the simulator are switched (for all distinguisher, there is a 
simulator. . . 

In this paper, and similarly to m , we will slightly tweak the definition of 
strong indifferentiability as follows: we will describe a simulator which, for any 
distinguisher D making a polynomial number q of queries, runs in time at most 
t and makes at most a queries with overwhelming probability (rather than prob- 
ability one) in system D G S . This is not a big concern since any such simulator 
S can be transformed into a simulator S’ for weak indifferentiability (which is 
sufficient for the composition theorem of m to hold) which takes the maximal 
number of queries q of V as input, and aborts when its number of queries be- 
comes larger than a (computed as a function of q), hence making at most a 
queries with probability one. 

2.3 The Iterated Even-Mansour Cipher 

Fix an integer r > 1. Let P = (Pi, . . . , P r ) be a tuple of permutations on {0, 1}". 
The r-round iterated Even-Mansour construction associated with P, denoted C F , 
is the block cipher with message space {0, l} n and key space ({0, l} n ) r+1 which 
maps a message x and a key (fco, • ■ ■ ■ k r ) to the ciphertext defined by: 

C F ((k 0 , ...,k r ),x) = k r ® P r (/c r _i © P r _i(- ■ ■ P 2 {h 0 P^ko ©*))■••)) • 

Let 7 = (70, ... , 7r) be a tuple of efficiently invertible permutations on {0, 1}". 
The single-key r-round iterated Even-Mansour construction associated with P 
and 7, denoted C F,J , is the block cipher with message space {0, 1}" and key 
space {0, 1}" which maps a message x and a key k to the ciphertext defined by 
(see Figured]): 

C F ’^(k, x ) = 7r (fc) 0 P r { 7r— l(fc) 0 Pr- !(• ■ ■ PMk) 0 Pi(7o(fc) ©*))•••)) ■ 

In all the following, we will focus on the case where all permutations 74 are the 
identity, and simply denote C F the resulting cipher, namely: 

C F (k,x) = k®P r (k®P r -i(---P 2 (k®P 1 (k<£ *))•••)) • 
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We stress that our main result (Theorem [TJ) holds for arbitrary permutations 7 * 
as long as they are efficiently invertible. 


k k k 



Fig. 1. The single- key iterated Even-Mansour cipher with r rounds Cf' 1 . We focus in 
this paper on the special case 7 * = Id for i = 0, . . . , r. 


3 Indifferentiability for Twelve Rounds 

In this section we prove the main result of this paper, which is the following 
theorem. 

Theorem 1. For any q, the 12-round, single-key iterated Even-Mansour cipher 
Cf 2 ’ 7 with twelve independent random n-bit permutations P = (Pi, . . . , P12), and 
fixed, efficiently invertible n-bit permutations 7 = ( 70 , ... , 712 ) for the key sched- 
ule, is strongly and statistically ( q , a, t, e)-indifferentiable from an ideal cipher E 
with n-bit blocks and n-bit keys, where: 

2 91 x a 12 

a = 2 7 xq 4 , t = £)(q 6 ), and e= — . 

To prove this, we will describe an efficient simulator <S, and show that the two 
systems (C^’ 7 , P) and (E, S E ) are indistinguishable. For simplicity we focus on 
the case where all q^’s are the identity, but the generalization is straightforward. 

Notational Convention. In all this section, we will use the following useful 
notational convention: we will interchangeably denote the input to the ideal 
cipher or the iterated Even-Mansour cipher x or yo, and the output y or 273 . 


3.1 Informal Description of the Simulator 

We start with a high-level view of the simulator (see also Figure |2J. It offers 
an interface to the distinguisher for querying the simulated permutations, which 
formally takes the form of a public procedure Query( j, d, z), where i £ { 1 , . . . , 12 } 
names the permutation, 5 £ {+, — } tells whether this is a direct or indirect query, 
and z 6 {0, 1}" is the actual value queried. The simulator maintains an history 
for the simulated permutations under the form of hash tables Pi, ... , P 12 . Each 
such table maps entries (<5, z) £ {+, — } x {0,1}” to values z' £ {0,1}". We 
denote P+, resp. P~, the (time-dependent) sets of strings z £ {0, 1}" such that 


454 R. Lampe and Y. Seurin 


Pj(+, z), resp. Pi(—,z), is defined. When the simulator receives a query (i, S, z), 
it looks in hash table P* to see whether the corresponding answer P,(<5, z) is 
already defined. When this is the case, it outputs the answer and waits for the 
next query. Otherwise, it draws a uniformly random answer z' and defines in hash 
table Pi(S,z) := z as well as the answer to the opposite query Pi(5,z') := z 
(note that this last assignment may overwrite an entry in PJ). 

Additionally, before outputting the answer z 1 , and for some specific values of 
(i,S), the simulator triggers a chain detection mechanism followed by a chain 
completion mechanism to ensure consistency of its answers with the ideal cipher 
E. An essential point to notice about the iterated Even-Mansour cipher in order 
to understand these mechanisms is that given an output value yi for permutation 
Pi and an input value *j+i for permutation Pj+i, it is possible to compute the 
corresponding key k = yi © x l+ i , and therefore to move forward and backward 
in the construction up and down to the corresponding input x and output y 
to the cipher. Hence, any tuple {y t ,x l+ \ , i) (a so-called partial chain later in 
the reasoning) defines a unique computation path inside the whole construction. 
This is the exact analogue of the property of the Feistel construction that the 
input values to two consecutive round functions uniquely define the computation 
path inside the Feistel network. 

There are exactly six such values of (i, S) for which the simulator performs 
additional steps: (2, +), (6, +), (6, — ), (7, +), (7, — ), and (11, — ). The cases (2, +) 
and (11 , — ) are similar. When receiving a query (2, +, *2) for which the answer 
is still undefined, the simulator, after having drawn a random answer t/2 to 
this query, considers all values y\ £ Pf , computes the corresponding key k := 
yi ® *2, and moves backward in the iterated Even-Mansour cipher by computing 
Xi := Pi(—,yi), yo '■= %i © k, *13 := E(k,yo) (hence making a query to the 
ideal cipher), and y\2 := *13 © k, and checks whether iji2 e Pi2- When this is 
the case, it enqueues in a queue Queue the tuple (yo,x\,Q,4). The first three 
elements (yo, X\ , 0) specify the partial chain that must be completed, while the 
last element i = 4 specifies which permutation will be adapted during completion 
of the chain to ensure consistency with E. The behavior of the simulator when 
receiving a query (11. — - jt/i i ) is symmetric: after having drawn a random answer 
*11, for all *12 e Pt , 2 , it moves forward in the iterated Even-Mansour cipher 
to check whether the corresponding value *i is in P, 1 , and if so enqueues the 
corresponding tuple (yo, *i, 0, 9) (note that in this case adaptation will take place 
at permutation Pg). 

The four remaining cases (i, 5) = (6, +), (6, — ), (7, +), and (7, — ) are similar, 
except that there is no check: the simulator enqueues a tuple (ye, *7,6, £) for 
each newly generated pair (ye - *7) G P 6 _ x P/ . If this was a query with i = 6, 
then adaptation will take place at £ = 4, while if this was a query with i = 7, 
adaptation will take place at l = 9. Assume for a concrete example that the 
simulator receives a query (6, + , *6) whose answer is undefined yet. Then it draws 
a random answer yo <— $ {0, 1}", and enqueues (ye, *7,6,4) for all *7 £ PJ . 

Immediately after having enqueued newly created chains (yi,Xi + i,i,£), the 
simulator starts completing the partial chains, by dequeuing tuples from Queue. 
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For this, when dequeuing (?/j, Xi + i, i, t), it computes the key k := CD 2h+i, 
and moves forward and backward in the iterated Even-Mansour cipher, possibly 
defining missing permutations values Pj(+,-) or Pj(— ,•), and making a query 
to E(k, ■) to “wrap around”, until it reaches the input value xe for Fj (when 
moving forward) and the corresponding output ye (when moving backward). It 
finally “adapts” permutation Pe by setting Pe(+, Xf ) := ye and Pe(—,ye) '■= X(. 
in order to ensure consistency of the entire chain with E. It also adds chains 
that have been completed in a set Completed in order to avoid completing 
them twice. While completing a chain and adding possibly missing permutation 
values, the simulator uses the same chain detection mechanism as when receiving 
a direct query from the distinguisher. Hence new tuples may be enqueued while 
dequeuing and completing a chain, and the simulator keeps dequeuing tuples 
until the queue is empty. When this is the case, it returns the answer to the 
original query of the distinguisher. 

As in the indifferentiability proof of the Feistel construction, there will be two 
crucial points to show: first, that the recursive chain completion mechanism ter- 
minates in polynomial time (except maybe with negligible probability); second, 
that the simulator can always adapt, i.e. that it never has (or only with negligi- 
ble probability) to overwrite previously defined entries when adapting a chain, 
which would render previously completed chains inconsistent with the ideal ci- 
pher E. Permutations P3, P5, Ps, and P10 (i-e. the permutations surrounding 
the two adaptation rounds P4 and P9) will play a key role while proving this 
last point: they will ensure that no bad collisions occur at the input or output 
of the two permutations used for adapting chains. 

We defer the formal definition of the simulator to the full version of the 
paper [ 35 ] . 

3.2 Sketch of the Proof of Theorem Q] 

We sketch the main ideas of the proof of Theorem [TJ The detailed proof is 
deferred to the full version of the paper [ 33 ] • 

We use intermediate systems that are depicted on Figure [ 3 ] System E\ is 
the simulated world ( E,S e ), while £4, is the real world (Cf). P ). In system £ 2 , 
the ideal cipher E is replaced with a so-called keyed two-sided random function 
P (rj) which offers the same interface for encryption and decryption as the ideal 
cipher. However, when asked for an encryption query (k,x) or a decryption 
query ( k,y ), T first checks (by maintaining a hash table denoted E) whether 
this value appeared in a previous query, and if so answers consistently. Otherwise 
it draws a uniformly random answer (the randomness is made explicit through a 
uniformly random table rj) and updates E. Besides, E has an additional interface 
J r .Check(fc, x, y) (only used by the simulator) which returns true if and only if 
E(+,k,x) = y or E(—,k,y) = x (in particular, if neither (k. x) was queried 
for encryption nor ( k,y ) for decryption, Check(/c, x, y) returns false). In £%, the 
simulator S is slightly modified into a new simulator T which queries Check 
rather than the encryption or decryption interface when deciding whether a 
tuple {yo,x\,Q,t) must be enqueued. Moreover the randomness of T is made 
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explicit with uniformly random tables ip = {'•pi , . ■ . , ^ 12 ). In system E 3 , the 
keyed two-sided random function is replaced with an iterated Even-Mansour 
cipher using uniformly random permutations n = (tti, . . . ,7ri2), enhanced with 
a Check procedure similarly to T. The simulator T now uses tables 7 r as well for 
its random draws. 

To prove Theorem [TJ we will upper bound the statistical distance between 
successive worlds Si- Additionally, we must show that S makes a polynomial 
number of oracle queries and runs in polynomial time in E\ with overwhelming 
probability. We start the analysis in S 2 ' namely we show that in this system, T 
will always complete at most q chains of the form ( 7 / 0 , x \ , 0, 1). The reason for this 
is quite simple: since T uses interface T. Check to decide whether such a tuple 
must be enqueued, such a chain can be detected and enqueued only if (k,y 0 ) 
with k = 7/0 © x\ appeared in the queries (or the answers) of the distinguisher to 
T. Since by assumption the distinguisher makes at most q queries, this implies 
the result. Starting from this observation, one can then upper bound the size of 
the hash tables Pi maintained by the simulator as well as the number of queries 
of T to T. 

We then upper bound the statistical distance between E\ and E%. For this, 
we appeal to a previous result from J35] to obtain the following lemma. 

Lemma 1. For any distinguisher V which makes at most q queries in total, we 
have: 

|Pr [V Sl = 1] - Pr = fj | < g . 

As a side result, this directly implies that with overwhelming probability, S 
runs in polynomial time and makes a polynomial number of queries to E in 
system E\, as captured by the following lemma. 
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Lemma 2. Assume that the distinguisher V makes at most q queries in total. 
Then with probability greater than 1 — 2 21 x q 12 / 2" over an execution ofV El , 
the simulator S makes at most 2 7 x q 4 queries to E or P _1 (assuming S never 
repeats a query), and runs in time at most 0(q 6 ). 

We then move to the hard part of the proof, which is to upper bound the sta- 
tistical distance between E 2 and E3. For this, an important first step is to show 
that in E 2 , the simulator never (more precisely only with negligible probabil- 
ity) overwrites an entry in hash tables P, during a call to ForceVal (i.e. the 
procedure which adapts chains by forcing the value of permutation P4 or P9). 
To reason about the behavior of system E 2 , we introduce the concept of partial 
chain, which is simply a tuple (yi, Xj+i, i) for i £ {1, . . . ,12}. Considering, at 
some point in the execution, hash tables Pi, . . . , P12 maintained by the distin- 
guisher and the hash table E maintained by T , we define for any partial chain 
C = (yi, Xi + \, i) and any l £ {1, ...,12} the functions val+(C) and val J(C) 
as follows: val+(C) is defined as the direct input value xi to permutation Pi 
obtained when moving forward in the Even-Mansour construction (possibly look- 
ing in hash table E to wrap around), or|5jfcjis at some point the computation 
stops because the necessary value was missing in some hash table (including E). 
Similarly val}“ ( C ) is defined as the indirect input value yi to permutation Pi 
obtained when moving backward in the Even-Mansour construction, or T if the 
computation stops at some point. 

As a preliminary step, we need to exclude some bad events that lead to a 
pathological behavior of E 2 . These bad events correspond to the draw of bad 
values when the simulator randomly defines the value of some permutation Pj 
or when T draws a random answer. More precisely, the bad values are exactly 
those that can be written as the bitwise xor of up to five values in the history, 
where the history includes all n-bit strings appearing in hash tables Pj and E 
at the moment where the random answer is drawn. Since the size of the history 
remains polynomial, the probability of these bad events is negligible. 

Then, the proof that the simulator never overwrites an entry in hash tables 
P* during a call to ForceVal roughly consists of two steps. First, we show that 
just before the query which leads to some partial chain C being enqueued to 
be adapted at position t, one has val+_-j (C) = _L and val}j fl ((7) = _L, unless 
an equivalent chain B (where equivalent means that one can obtain B from C 
by moving forward or backward in the Even-Mansour construction) has been 
previously enqueued. This crucially relies on fact that the two chain detection 
zones (“border” and “center”) are “protecting” each other. For example, consider 
the case where some chain C = (yo,xi,$) is enqueued to be adapted at position 
l = 4 due to a query for P 2 {x 2 ). Then clearly, before P 2 {x 2 ) is defined, one has 
val^C) = _L. On the other side, if val^"(0) 56;^ then this means that C is 
equivalent to some partial chain B = (ye,X7,6) with ye € P$ and £7 £ PA , so 
that D would have been enqueued previously due to some query to Pe or P7. 

The second step is to show that between the moment where C is enqueued, and 
the moment where C is dequeued, the completion of other chains (possibly) in 
the queue will not lead to val^ 1 (C') £ P^_ 1 or valj( +1 (C') e ^+1 ‘ In particular 
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this requires to show that C cannot collision with another, previously enqueued 
chain D at round £—1 or £ + 1. This is carried out via a careful analysis of all 
the ways this could happen, which would all imply the occurrence of the bad 
event previously discussed. Once this is done, it is easy to show that no entry 
is overwritten during the call to ForceVal when adapting C. To finalize the 
reasoning, we use a randomness mapping argument similar to the one that was 
introduced in |.' 12 j , and obtain the following lemma. 

Lemma 3. For any distinguisher V which makes at most q queries in total, we 


have: 



| Pr [x>^{w) = i] -p r [l 


Finally, upper bounding the statistical distance between £3 and £4 is easily 
handled, and yields the following lemma. 

Lemma 4. For any distinguisher V which makes at most q queries in total, we 


have: 



Combining Lemmas [TJ [21 El and IH finally enables to prove Theorem [T] 

Remark 1. Our choice to use a keyed two-sided random function and a simulator 
T accessing random function tables ip in system £2 allows to handle uniformly 
random values, which slightly simplifies the computation of various bounds in the 
proof. It is however possible (and conceptually more satisfying) to use an ideal 
cipher enhanced with a Check procedure rather than a keyed two-sided random 
function, and random permutation tables rather than random function tables. 
This would have some nice effects in the analysis of system £ 2 , in particular this 
would exclude some bad events such as potential overwrites in the hash table 
E when F defines an answer by reading table r] or in hash tables P, when T 
defines an answer by reading tables 93 ,;. This kind of approach was taken in [2]. 

Remark 2. If one contents oneself with weak indifferentiability (where the sim- 
ulator is allowed to depend on the distinguisher), one can slightly simplify the 
simulator by having it abort when it is about to complete more than q chains of 
the form (yo,£i, 0 ); this allows to get rid of the intermediate system £2 where 
the Check procedure is added to the keyed two-sided random function (or the 
ideal cipher) in order to ensure that the simulator makes a polynomial number 
of queries and runs in polynomial time with probability 1. Such a simplification 
does not seem to be possible if one wants to define a universal simulator which 
does not depend on q. 
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Abstract. We propose new generic key recovery attacks on Feistel- type 
block ciphers. The proposed attack is based on the all subkeys recovery 
approach presented in SAC 2012, which determines all subkeys instead 
of the master key. This enables us to construct a key recovery attack 
without taking into account a key scheduling function. With our ad- 
vanced techniques, we apply several key recovery attacks to Feistel-type 
block ciphers. For instance, we show 8-, 9- and 11-round key recovery 
attacks on n-bit Feistel ciphers with 2n-bit key employing random keyed 
F-functions, random F-functions, and SP-type F-functions, respectively. 
Moreover, thanks to the meet-in-the-middle approach, our attack leads 
to low-data complexity. To demonstrate the usefulness of our approach, 
we show a key recovery attack on the 8-round reduced CAST-128, which 
is the best attack with respect to the number of attacked rounds. Since 
our approach derives the lower bounds on the numbers of rounds to be 
secure under the single secret key setting, it can be considered that we 
unveil the limitation of designing an efficient block cipher by a Feistel 
scheme such as a low-latency cipher. 

Keywords: block cipher, key scheduhng function, all-subkeys-recovery 
attack, meet-in-the-middle attack, key recovery attack, low-data com- 
plexity attack. 

1 Introduction 

A block cipher is considered as an essential technology on modern cryptography, 
since it is one of the most widely used primitives. Moreover, studies on designing 
a secure and efficient block cipher are useful also for designing other symmetric 
primitives such as hash functions and stream ciphers. Since DES was developed 
in 1977 [19], a lot of progress has taken place in this area. Recently, with the large 
deployment of network devices requiring security, block ciphers satisfying new 
demands such as lightweight and low-latency have received a lot of attention. In 
fact, several block ciphers designed for a lightweight hardware implementation 
have been proposed such as PRESENT [5], KATAN/KTANTAN [TB], LED [2D] 
and Piccolo [34]. The concept of a low-latency encryption, which is used for an 
application requiring an instant response, was discussed in [23]. Since a low- 
latency encryption requires a quick response, the number of rounds must be 
reduced as much as possible compared to a general-purpose block cipher such 
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as AES. In 2012, PRINCE was proposed as an instantiation of a low-latency 
cipher m ■ Note that PRINCE is not only a low-latency cipher, but also a 
lightweight block cipher even after supporting both encryption and decryption. 
Those features are considered to be important in practical use of the cipher, 
since its lightweightness directly leads to low power and energy consumption 
and supporting decryption function without much cost leading to this cipher 
being used more widely. 

In general, an SPN cipher requires an inverse function when supporting de- 
cryption, and thus an SPN cipher with a decryption function needs additional 
gate areas. In spite of the fact that PRINCE is an SPN cipher, it is efficiently 
implemented even when implementing a decryption function due to its novel 
property called a-reflection. However, as pointed out by the designers, it has 
been known that a-reflection reduces the security of the cipher |12l23l3fi| and 
thus the cipher having a-reflection does not have optimal security. Meanwhile, 
it has been known that a Feistel cipher, another traditional structure of block 
cipher, is suitable for a lightweight block cipher especially when supporting both 
encryption and decryption, since it does not require an inverse function. Thus, 
a Feistel cipher is considered as a possible candidate of a low-latency cipher, if 
it has sufficiently small number of rounds. However, it has been still unknown 
how many rounds are sufficient for a Feistel cipher to be secure. Note that, for 
low-latency encryption, since the key scheduling function can be precomputed, 
it can be a heavy function. Thus, its performance with respect to low-latency is 
considered to mainly depend on the data processing part, namely its number of 
rounds. Hence, our question is “how many rounds can be reduced without loss 
of security requirements for Feistel schemes” . 

In this paper, we tackle the security evaluations of several Feistel schemes, 
assuming that the key scheduling function is an ideal function. We deal with 
key recovery attacks under the single secret key setting by extending the all 
subkeys recovery approach m- Since our approach derives the lower bounds 
on the numbers of rounds to be secure against a key recovery attack even if the 
underlying key scheduling function is an ideal function, our results show the lim- 
itation of designing a low-latency encryption by a Feistel scheme. We introduce 
several advanced techniques including function reduction and key linearization. 
Using those advanced techniques and with the help of the meet-in-the-middle 
approach fT0l2Tl , we show several key recovery attacks on various Feistel ciphers. 
Table Q] summarizes the number of attacked rounds for Feistel schemes by both 
distinguishers and key recovery attacks under the single secret key and known- 
key settings. Compared to the previous results, some of our attacks are the first 
generic key recovery attacks and also the best for several Feistel schemes with 
respect to the number of attacked rounds, even if the attacker is allowed to use 
the known secret key. Moreover, our attack does not restrict the underlying F- 
function to a permutation, which is a limitation of some of the previous attacks. 
Furthermore, one of the advantages of our approach is its low data requirement 
thanks to the meet-in-the-middle approach, in contrast to the classical statisti- 
cal attacks such as an impossible differential attack [B], As an example for the 
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Table 1. Numbers of Attacked Rounds by Generic Attacks on Feistel Schemes 


Single Secret Key Setting 


Attack Type 

Feistel- 1 

Feistel- 2 

Feistel-3 

Distinguisher 

5 [30] 

5 [30] 

5 [30] 

6* 

m 

5* [2S] 

5* [2S] 

5* 

HI 

5* m 

5* QT] 

Key Recovery Attack ( k = 2 n) 

7 [22] 

8 (Ours) 

9 (Ours) 

11 (Ours) 

Key Recovery Attack (fc = 3n/2) 

5 [22] 

6 (Ours) 

7 (Ours) 

9 (Ours) 

Key Recovery Attack ( k = n) 

3 [22] 

4 (Ours) 

5 (Ours) 

7 (Ours) 


Known Key Setting 

Distinguisher | not given | 7 |26] | 11* [33] 

* : Each F function is restricted to a permutation 


practical impact of our work, we show the best attack on the reduced CAST- 
128 pQ even when its key scheduling function is ideal. Also, we show extremely 
low-data attacks on the reduced Camellia [5] with less than 60 data sets. 

This paper is organized as follows: Section [2] gives notations and definitions 
used throughout this paper, and gives a brief review of the all subkeys recovery 
approach. We review the related work and show its improvement in Section [3] 
Our key recovery attacks on two types of Feistel ciphers and those applications 
to CAST-128 and Camellia are described in Sections 0] and [5] Section 0] discusses 
the usefulness of our attack. Finally, we conclude in Section [7] 

2 Preliminary 

In this section, we give notations used throughout this paper, then define our 
target Feistel ciphers. Finally, we briefly review the all subkeys recovery approach 
presented in [33] . 


2.1 Notation 

The following notation will be used throughout this paper: 
n : block size. 

k : the size of the master key. 

Li, Ri : left or right half of the i-th round input. 

Ki : the i-th round subkey (n/2 bits). 

£ : the size of an S-box. 

m : the number of S-boxes in an S-box layer. 
Xi : the i-th round state. 

Xij : the j-th S-box word (Abit data) of X L . 
XiL, X lR : left or right half bits of A, : . 
a\b or (a\b) : Concatenation. 


2.2 Feistel Cipher 

In this paper, we focus on balanced Feistel networks as illustrated in Fig. [TJ 
An n-bit plaintext P is divided into two sub-blocks as P = (Li,i?i), where 
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K 





Fig. 1. Balanced Feistel Network (Feistel-1) 

Li, Ri £ {0, l}"/ 2 . Then the (i + l)-th round input state is calculated as follows: 

where F/^ : {0, l} n / 2 — ► {0, l}"/ 2 is a keyed function in the <-th round using the 
t-th round (n/2)-bit subkey K, . An n-bit ciphertext C for the r-round encryption 
function is derived as C = (R r+ i,L r+ i). Note that the last round of the Feistel 
cipher does not have a swap operation. Hereafter, the size of each subkey used 
in one round is assumed to be half of the block size (i.e., FQ G {0, l}”/ 2 ). 

In this work, we deal with three types of Feistel block ciphers illustrated in 
Fig. |21 Feistel-1 denotes the Feistel cipher with random keyed F-functions. Each 
subkey is assumed to be randomly independent. Thus each keyed F-function is 
also independent from each other. In concrete ciphers, each subkey is usually 
XORed before an F-function. Feistel-2 reflects such ciphers. In other words, the 
output of the F-function Y, = (JQ) is represented as Y t = F^DQ CD K t ), 
where Fj is a fixed function in the i-th round (not limited to a permutation). 
Similarly, Feistel-3 is the Feistel-2 cipher whose Fj is limited to an SP-type F- 
function, where each F-function consists of a bijective S-box layer (S-layer) and 
a linear diffusion layer (P-layer), and an n/2-bit subkey is XORed before the 
S-box layer. Each S-box layer consists of m Gbit S-boxes (i.e., rn ■ l = n/ 2), and 
each P-layer consists of an m x m linear matrix represented as Mi. Note that 
Feistel-1 includes Feistel-2 and Feistel-3, also Feistel-3 is a subset of Feistel-2. 
The size of the master key is denoted as Feistel- [/c]. For example, Feistel-2 [n] is 
the Feistel cipher with fixed F-functions XORed by a subkey before the function 
whose master key size is the same as the block size (e.g., a 128-bit block cipher 
taking a 128-bit key). 

2.3 All Subkeys Recovery Approach |22| 

The all subkeys recovery (ASR) attack was proposed by Isobe and Shibutani at 
SAC 2012 [ 22 ] - The ASR attack is considered as an extension of the meet-in-the- 
middle (MITM) attack, which mainly exploits a low key-dependency in the key 
scheduling function. The basic concept of the ASR attack is guessing all subkeys 
instead of the master key so that the attack can be constructed independently 
from the structure of the key scheduling function, by regarding all subkeys as 
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independent variables. Thus the attack can also be applied to a block cipher 
having a complex key scheduling function. 

Let us briefly review the procedure of the ASR attack. In the ASR attack, 
an attacker first determines a t-bit matching state X, where X £ {0, 1} * . In the 
forward direction, the matching state derived from a plaintext P and a set of 
subkeys /C(i) by a function Jqi) is represented as X = .F(i)(P, /C( i)). Similarly, 
the state computed from a ciphertext C and another set of subkeys /C( 2 ) by 
a function P( 2 ) in the backward direction is denoted as X = F^{C,K{ 2 ))- 
/C( 3 ) denotes a set of the remaining subkeys not required for computing X, i.e., 
|/C ( i)| + |/C( 2 )| + |/C( 3 )| = r • n/2. The attacker guesses and /C( 2 ) in parallel, 
then checks if the equation P^)(P. = J 7 ^ 1 (C, /C( 2 ) ) holds. Note that the 

equation holds when the guessed subkey bits are correct. After this process, 
it is expected that there will be 2 r ' n ^ 2 ~ t key candidates. Finally, the attacker 
exhaustively searches the correct key from the surviving key candidates. The 
required computations of the attack in total C comp using N plaintext /ciphertext 
pairs is estimated as 

Ccomp = max(2l ,c wl,2W} X N + 2 r ""/ 2-JV ' t . (1) 

The number of required plaintext / ciphertext pairs is max( IV, \{r-n/2—N -t)/n\). 
The required memory is about min(2l A '( 1 )l , 2^^) x N blocks, which is the cost 
of the table used for the matching. Clearly, the ASR attack works faster than the 
brute force attack when Eq. dT|) is less than 2 k , which is the required computations 
for the brute force attack. 


3 Generic Key Recovery Attack on Feistel-1 

In this section, we first review key recovery attacks on balanced Feistel networks 
presented in [33] and generalize it to Feistel-1 [n], -l[|n] and -l[2n]. After that, 
we show that the basic attack can be improved by using splice and cut [3] and 
key linearization techniques. By the improved attack, the numbers of attacked 
rounds for the Feistel-1 are increased by one round. 

For a Feistel-1 cipher, an (n/2)-bit matching state X is computed from a 
plaintext P and a set of subkeys 1C^ £ {K^, K^\ ..., K^ a ~ v> } as shown in 
Fig. Q] (i.e., X = Pd^PjlC^)). Similarly, the matching state is obtained from 
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Fig. 3. Splice and Cut Technique for Feistel-1 

a ciphertext C and another set of subkeys /C( 2 ) G {K l ' a+i \ K { °+' 2 ) . .... K (r ^} as 
X = J 7 ^ 2 J(C. /C( 2 )). Also, X is computed independently from an (n/2)-bit subkey 
K<*>, i.e., /C (3) G {KW}. 

3.1 Basic Attack on Feistel-1 m 

For Feistel-1 [2n] (e.g., a 128-bit block cipher accepting a 256-bit key), 7 rounds 
of the cipher can be attacked in a straightforward manner, since ^(i) and J~ ( 2 ) 
are composed of 3 rounds of the cipher and thus the sizes of /C(i) and /C( 2 ) are 
both 3 • nj 2 bits. In this attack, the total time complexity C comp using four 
plaintext/ciphertext pairs is estimated as 

Ccomp = max(2 3 "/ 2 , 2 3 "/ 2 ) x 4 + 2 7 -"/2-4-n/2 « 2 3rt / 2+2 (= 2 3fc / 4+2 ) 

The required memory is about 4x2'W 2 blocks. Since C comp is less than 2 2n (= 2 k ) 
when (4 < n) , the attack works faster than the exhaustive key search. 

Similarly to this, for Feistel-1 [|n] and Feistel-1 [n] (e.g., a 128-bit block cipher 
accepting a 192-bit key or a 128-bit key), key recovery attacks of at least 5 and 3 
rounds of the cipher are constructed, respectively. For Feistel- l[|n], Xn) and X( 2 ) 
consist of 2 rounds of the cipher, and thus the sizes of /Cm and /C( 2 ) are both n 
bits. Therefore, the required time complexity using 3 plaintext /ciphertext pairs 
is estimated as C comp = max(2", 2") x 3 + 2 5 "/ 2-3 "/ 2 sw 2 n+2 , and the required 
memory is about 2"+ 2 blocks. For Feistel- l[n], a similar attack on 3 rounds 
requiring 2"/ 2+1 (« 2"/ 2 x 2 + 2"/ 2 ) computations and (2 x 2"/ 2 ) blocks memory 
is mounted by using 1 round of J ^ and J r ( 2 ). Roughly speaking, when Eq.([T]) is 
less than 2 fc , the ASR attack works faster than the brute force attack. Therefore, 
the necessary condition for the basic ASR attack is that each size of all subkeys 
in X{\) and J r ( 2 ) is less than the size of the master key. 


3.2 Improved Attack on Feistel-1 

We demonstrate that the basic attack on Feistel-1 presented in M is improved 
by controlling the value of plaintexts. It allows us to attack one more round on 
Feistel-1, e.g., an 8-round attack on Feistel-1 [2n]. 

Suppose that an input L \ (= R- 2 ) is fixed to an arbitrary (n/2)-bit constant 
CON, then L 2 is expressed as L 2 = R\ 8 K[ , where K[ = Fi(Ki ® CON). 
Since K[ depends only on K \ , it is regarded that a new (n/2)-bit subkey K\ 
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is linearly inserted in the first round without an F-function, which is called key 
linearization. 

As shown in Fig. [3l since K[ can be divided into two (n/4)-bit words K[ L 
and K' 1R , the splice and cut technique in [3] enables us to separately use K' 1L 
and K[ r in and J'(2), respectively. Note that, in the splice and cut tech- 
nique, the MITM attack starts from multiple values of start states for parallel 
guesses of JC W and /C( 2), while the basic MITM attack starts from multiple 
plaintext /ciphertext pairs. 

For Feistel-1[2«], an 8-round generic key recovery attack is mounted thanks to 
the splice and cut technique, while each cost (namely time, memory and data) 
for the attack is increased by 0( 2"/ 4 ) compared to the basic attack. The size of 
each key set /C(i) and /C(2) is increased by (n/4) bits due to the splice and cut, 
and thus the size of each set /C(i) and /C( 2) is 7n/4(= 3 • n/2 + n/4) bits long. In 
this attack, the total time complexity C comp using five start states is estimated 
as 

C comp = max(2 7 ”/ 4 , 2 7 "/ 4 ) X 5 + 2 8 ^ 2 ~ 5 ^ 2 * 2 7 "/ 4 + 3 (= 2 7fe / 8 + 3 ). 

The required memory is about 5 x 2 7 ”/ 4 blocks. Since (n/4) bits of plaintexts are 
varied depending on /C( 2 ) and the start states, the required data is 2”/ 4 chosen 
plaintexts when the other 3n/4 bits of the start state are fixed. 

For Feistel-l[|n] and Feistel-l[n], by using the splice and cut technique, key 
recovery attacks of at least 6 and 4 rounds of the cipher are constructed, re- 
spectively. For Feistel-l[|n], the sizes of /C(i) and /C(2) are 5n/4 bits each. 
Therefore, the required time complexity with four start states is estimated as 
Ccomp = max(2 5n / 4 , 2 5 "/ 4 ) X 4 + 2 6 "/ 2 - 4 "/ 2 « 2 5n / 4 + 2 , and the required mem- 
ory is about 2 5 "/ 4+2 blocks. For Feistel-l[n], a similar attack requiring 2 3n / 4 + 2 
(w 2 3 "/ 4 x 3 + 2"/ 2 ) computations and (3 x 2 3 ”/ 4 ) blocks memory is mounted. 
These attacks also require 2"/ 4 chosen plaintexts. Those results are summarized 
in Table HJ 

4 Key Recovery Attack on Feistel-2 

This section shows generic key recovery attacks on Feistel-2 ciphers. In con- 
trast to Feistel-1 ciphers, key injections of Feistel-2 ciphers are restricted to 
XOR operations. This allows an attacker to equivalently transform subkeys, 
then more rounds can be attacked. To begin with, we introduce an advanced 
technique called function reduction , which enables us to reduce the number of 
involved subkey bits by exploiting degrees of freedom of a plaintext/ciphertext 
pair. Combining it with a (multi-) collision technique, 5, 7 and 9 rounds attacks 
on Feistel-2 [n], -2[§n] and -2[2n] are demonstrated, respectively. The overview 
of the function reduction is depicted in Fig. 01 The required complexities for 
those attacks are summarized in Table [2j and the overview of the attacks are 
illustrated in Fig. [oj Note that the key additions of Feistel-2 are limited to XOR 
operations, however, similar idea may be applied to other key additions such 
as modular additions. Moreover, as an application of our approach on Feistel-2, 
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Fig. 4. Function Reduction Technique 

we show a key recovery attack on the reduced CAST-128 m- The structure of 
CAST-128 is similar to Feistel-2, however, the size of each round key of CAST- 
128 is larger than that of Feistel-2 and the key additions are not only XOR 
operations but also modular additions and subtractions. Since the larger round 
key generally requires more computations to guess, it seems to be hard to di- 
rectly mount an attack on CAST-128. We use the improved function reduction 
technique to make an attack feasible, then show a key recovery attack on the 
8-round reduced CAST-128, which is the best attack known in literature. 

4.1 Function Reduction Technique 

Suppose that the half outputs of the r-round Feistel-2 cipher L r+ 1 and R r +i 
are represented by functions Pl.t and Tr. t as L r+ 1 = L-\\R ,\ ) and 

R r+ 1 = J-r.t (Ar- Li\Ri), where Kr and JCr denote sets of subkeys used in pR. r 
and respectively. In general, after sufficient number of round operations, 
all subkeys are required to compute L r +i, i.e., \ICl\ = n / 2 ■ r, while R r +i is 
derived independently from the last subkey K r , i.e., |/Cr| = n/2 ■ ( r — 1). For the 
Feistel-2 cipher, fixing half bits of inputs, one more round of subkey data can be 
reduced as follows: 

Theorem 1 (Function Reduction). For the Feistel-2 cipher, if Li is fixed, 
ICl and KLr used in P L<r and Pr, t contain at most (n/2 ■ r) and (n/2 ■ (r — 2)) 
subkey bits when r is odd, and contain at most (n/2 ■ (r — 1)) and (n/2 ■ (r — 1)) 
subkey bits when r is even, respectively. 

Proof. By using the key linearization, L 2 is considered to be linearly affected by 
the subkey K[ as follows. Assuming that L 1 is an arbitrary (n/2)-bit constant 
CON , L 2 and Ra are expressed as L 2 = Ri ® K[ and R 2 = CON , where 
K[ = F(Ki ® CON 0- Since K[ depends only on K\, it can be regarded as a 

1 For simplicity, we assume that all F-functions are identical. However, our attack 
works even if each F-function is distinct from each other. 
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(b) 7-round Attack on Feistel-2[|n] 



CON 1 L 2 L 3 L4 I/ 5 L g Lj L 8 Lg CON 2 


(c) 9-round Attack on Feistel-2[2n] 

Fig. 5. Key Recovery Attacks on Feistel-2 Ciphers 


new subkey instead of K\ (see Fig. IU-(b)). By using an equivalent transform, 
K\ is moved to the end of the cipher as shown in Figs. |U-(c) and (d). After the 
transform, each subkey introduced in even round is XORed with K \ , and thus 
it can be redefined as K' p = K p ® K[ (p is even). When r is even, K\ is linearly 
affecting to R r +i in the last as shown in Fig. 0]-(c). Therefore, both L r+l and 
R r +i contain at most (n/2- (r— 1)) bits of subkeys. When r is odd, K[ is linearly 
affecting to L r+ i in the last as shown in Fig.QJfd). Consequently, R r +i contains 
at most (n/2 ■ (r — 2)) bits of subkeys, while the amount of subkey bits required 
for computing L r+1 is not reduced (i.e., \Kl\ = n/2 ■ r). □ 

The function reduction technique, which consists of equivalent transforms of 
round keys and the key linearization, is related to the complementation prop- 
erties of Feistel networks in which the round keys of even (or odd) rounds are 
complemented by some fixed values. It essentially exploits the property of Feis- 
tel network that an input of a keyed F-function in the i-th round (Lj) linearly 
affects an input of a keyed F-function in the ( i + 2)-th round (L i+2 ). In other 
words, the relation of Lj and Lj + 2 is expressed as Lj +2 = Lj ® X i+ % , where X i+ i 
is an output of an F-function of (Lj + i). We exploit it in the line of a MITM at- 
tack to reduce the subkey data for the computation of the intermediate values, 
while the previous attacks are used for differential attacks [15I8| and speeding 
up keysearches using equivalent keys [7I18J . 
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4.2 Key Recovery Attack on 5-Round Feistel-2[n] 

In order to apply the function reduction to both the forward and backward 
computations, we prepare plaintext /ciphertext pairs in the form of L\ = CONi 
and i?6 = CON- 2 . where CONi and CON -2 denote arbitrary (n/2)-bit constants. 

Let R 4 be an (n/2)-bit matching state. From Theorem [TJ in the forward 
computation, R 4 can be computed by an (n/2)-bit subkey K 2 (= K 2 ® K[), 
where K[ = F{K\ ® CONi). In the backward computation, since R 4 can be 
regarded as an output of the even round (r = 2), R 4 can also be computed by an 
(n/2)-bit subkey K 4 (= K 4 ® K' 5 ), where K ' 5 = F(K 5 ® CON 2 ), i.e., /C(i) € K 2 
and /C( 2 ) £ K' 4 . Since |/C(i)| = |/C( 2 )| = n / 2 and the size of the matching state 
is also n/ 2, two plaintext /ciphertext pairs are sufficient to determine JC^ and 
/C( 2 ). In order to obtain such two pairs that have the form of L\ = CONi 
and Re = CON 2 , we use 2"/ 4 chosen plaintexts by randomly changing R\ as 
P = ( CONi\Ri ). After this process, we have 2 n / 4 corresponding ciphertexts, 
and thus there will exist (n/2) bits colliding R e with high probability due to the 
birthday paradox. 

The time complexity of determining K 2 and K 4 by the MITM approach is 
estimated as C comp = max(2"/ 2 , 2"/ 2 ) x 2 = T l /‘ 2+l . In order to determine all 
subkeys, we use the following equation: F(R 4 ® K 2j ) = R[ ® K[ ® Lq ® = 

Ri ® Lq ® K", where K’{ = K[® K$. Since R 4 can be computed from K 2 or 
K 4 , we can recursively mount the MITM approach to determine K 3 and K" 
with complexity of 2”/ 2+1 (= max(2 n / 2 , 2"/ 2 ) x 2). After exhaustively guessing 
K\ with a time complexity of 2"/ 2 , all subkeys Kj (1 < i < 5) are determined 
from the previously obtained subkeys K' 2 , K 4 and K”. Therefore, the whole time 
complexity is estimated as 2 n / 2+2 (w 2"/ 2+1 + 2”/ 2+1 + 2”/ 2 ). Due to k = n, the 
time complexity 2”/ 2+2 = 2 k / 2+2 is less than 2 k which is required computations 
for the brute force attack. The required data is 2”/ 4 chosen plaintext, and the 
required memory is about 2”/ 2+1 words. If the function reduction technique is 
used only in the forward computation, a 4-round attack is constructed with less 
data (see Fig. 03/ a) and Table [2]). 


4.3 Key Recovery Attack on 9- Round Feistel-2[2n] 

A key recovery attack on a 9-round Feistel-2[2n] is constructed in a similar 
way to the 5-round attack on Feistel-2[n]. In this attack, we can add 2 more 
rounds in each direction, and a 6-multicollision is required to obtain desired 
plaintext/ciphertext pairs unlike the attack on Feistel-2[n]. It has been known 
that an n-bit t-multicollision is found in t\ ■ 2 n ' ( - t_1 ^ t random data with high 
probability [3B]. Thus, the six plaintext /ciphertext pairs whose form are P = 
(CONi\Ri) and C = ( CON 2 \L 10 ) could be found from 6! 1 / 6 • (2"/ 2 ) 5 / 6 « 3 • 
(2 n/ 2 )5/6 c h osen plaintexts. More precisely, after querying 3 • (2”/ 2 ) 5 / 6 chosen 
plaintexts with distinct R\ , there will exist a 6-collision of Rio in corresponding 
ciphertexts with high probability (see Fig. 13(c) and Table |2|). 
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Table 2. Details of Our Attacks 


Target 

key size 

Round 

Time 

Memory 

Data 

Reference 

Feistel-1 

n 

4 

2 3n/4+2 

2 3n/4+2 

2"/ 4 

Sect. ES 

f n 

6 

2 5n/4+2 

2 5n/4+2 

2 n / 4 

Sect. [3] 

2 n 

8 

2 7n/4+3 

2 7n/4+3 

2 n / 4 

Sect. [3] 

Feistel-2 

n 

4 

2 n/2+2 

2 n/2+l 

2 

Sect. 14.21 

5 

2 n/2+2 

2"/2+i 

2 n/4 

Sect. |4j 

|n 

6 

2 s "/ 4 + 4 

2 n+ 4 

9 

Sect. 14.41 

7 

2 5n/4+4 

2 n+ 4 

9! l/9 . ( 2 n/2)8/9 

Sect. 14.41 

2 n 

8 

23»/2+3 

2 3n/2+3 

6 

Sect. 14.31 

9 

2 3 "/2+ 3 

2 3 "/2+ 3 

6! l/6 . (2 n/2 )5 /6 

Sect. 14.31 

Feistel-3 

n 

7 

2 3n/4+C . Ni 

2 3 n /i+e . Ni 

Nl 

Sect. 15.31 

|n 

8 

2 n+e ■ N 2 

2 n+e ■ N 2 

n 2 

Sect. 15.51 

9 

2 n+e ■ N 2 

2 n+e ■ N 2 

N \l/N 2 . ^ 2 n/2^AT 2 -l)/JV 2 

Sect. 15.51 

2 n 

11 

2 7n / i+l • N 3 

2 7 "/4-M . Na 

n 3 

Sect. EH 


Ni = (3n/2 + 21) /l, N 2 = (2 n + 21) /l, N 3 = (7n/2 + 2£)/£ 


4.4 Key Recovery Attack on 7-Round Feistel-2[|n] 

In this attack, R 5 is used as the matching state. From Theorem!]] in the forward 
computation, R 5 can be computed from 3 • n/2 bits subkeys K 2 , K :i and K[ . 
where K ' 2 = K 2 ® K[ and K[ = F{K\ ® CONj ) . In the backward computation, 
i?5 can be computed from 3 • n/2 bits subkeys K' 6 , if 5 and if/, where ifg = 
if 6 ® if/ and if/ = F(K 7 ® CON 2 ). Since R 5 is expressed as if/ ® £4 and 
if/ ® (F(Rq ® K 5 ) ® L 6 ), if only (n/4) bits of if/ ® if/ are guessed, (n/ 4)- 
bit matching is feasible. It is regarded that if/' (= if/ ® if/) is included in the 
backward computation (see Fig. [5}-(b)). 

Then, since |/C(i)| = n/4, |/C( 2 ) | = 5n/4, and the size of the matching state 
is n/4, nine plaintext/ciphertext pairs are required to determine /C(i) and /C( 2 ) 
due to the relation (n + 5n/4)/(n/4) = 9. Such nine plaintext/ciphertext pairs 
whose form are P = ( CON\\R\ ) and C = (CON 2 \L s ) can be found from 9! 1 / 9 • 
(2 n/ 2 )8/9 ^ 4 2 . (2"/ 2 ) 8 / 9 chosen plaintexts. The other complexities required for 
this attack and the low data attack on 6-round Feistel-2[|// are described in 
Tabled 

4.5 Application to 8-Round Reduced CAST-128 

In order to demonstrate the practical impact of our work on Feistel-2, we apply 
it to CAST-128 block cipher. Using the improved function reduction techniques, 
we show an attack on the 8-round reduced CAST-128 having more than 118 bits 
key, which is the best attack with respect to the number of attacked rounds in 
literature even when its key scheduling is an ideal function. 
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Description of CAST-128. CAST-128 [112] is a 64-bit Feistel block cipher 
accepting a variable key size from 40 up to 128 bits (but only in 8-bit increments). 
The number of rounds is 16 when the key size is longer than 80 bits. First, the 
algorithm divides the 64-bit plaintext into two 32-bit words To and Rg , then the 
i-th round function outputs two 32-bit data Lj+i and Ri+i as follows: 

L i+1 =Ri © Fi(L h K\ n % R i+1 = Li, 

where Fj denotes the i-th round function and K'- nd is the i-th round key con- 
sisting of a 32-bit masking key K mi and a 5-bit rotation key K ri . The detail of 
Fi is expressed as 

Fi = f((Li Oi K rni ) «< K n \ 

where / consists of four 8 to 32-bit S-boxes, <#; K ri denotes a K ri - bit left rota- 
tion, and O i denotes addition, XOR or subtraction depending on the round num- 
ber i, i.e., Oi denotes addition for i e {1, 4, 7, 10, 13}, XOR for i e {2, 5, 8, 11, 14} 
and subtraction for i £ (3, 6, 9, 12, 15}. We omit the details of /, since, in our 
analysis, it is regarded as the random function that outputs a 32-bit random 
value from a 32-bit input. 

Key Recovery Attack on 8-Round CAST-128. The structure and the 
parameter of CAST-128 having sufficiently large key are similar to Feistel-2[2n]. 
However, for CAST-128, a 37(= 32 + 5)-bit subkey is inserted into each F), i.e., 
a 32-bit subkey is used in Oi and the remaining 5-bit subkey is used in a key 
dependent rotation, while a 32-bit subkey is inserted in each round for Feistel- 
2 [2n] with n = 32. Thus, the 9-round attack on Feistel-2[2n] is not directly 
applicable to CAST-128. However, the improved function reduction technique 
allows us to construct an 8-round attack on CAST-128. 

Let i ?5 be an (n/2)-bit matching state. In the backward computation, Rg is 
fixed as CON, and K ' 8 = f((CON CD K rns ) <gc K rg ) is moved to L 5 and an input 
of the 7-th round function, by converting K rn , into K ' m5 = K' 8 (BK m5 , as shown in 
Fig. O Then, the input of / in the 7-th round is expressed as ( Lg ® K' 8 ) + K mi . 
If the lower b bits of Lg, which are controllable by the ciphertext, are fixed 
to 0, the lower b bits of this computation are expressed as K 8 + K mr . Thus, 
( K 8 + K rni ) is regarded as a new 6-bit subkey K ' m7 = ( K 8 + K rnr ) , while the 
upper (n/2 — b) bits remain (Lg ® K 8 ) + K mi . In the backward computation of 
Rs, |£(2) I = 37 X 2 + (b + (n/2 — b) X 2 + 5) bits of the key are involved. 
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\ matching 1 t 


Fig. 7. Matching without Matrix 

Evaluation. Since = 111, |/C( 2 )| = 114 ( b = 29) and the size of the 

matching state is 32 bits, eight plaintext /ciphertext pairs are required to de- 
termine /C(i) and /C( 2 ) due to the relation (111 + 114)/32 < 8(= 2 32-29 ). The 
required time complexity to determine subkeys K { nd , K™ 1 , K™ d , K™ 1 , K' ms , 
K r5 , the lower 29 bits of K' m , the upper 3 bits of K g , and K mT is estimated 
as C cornp = max(2 m , 2 114 ) x 10 « 2 118 . The remaining K ri and Kg are ex- 
haustively searched with the time complexity of 2 64 . Then, all subkeys are ob- 
tained by using the relations of K' mr = K' g + K mr . K' m5 = K' s © K' m _ and 
Kg = f{(CON ® K ma ) K r g ). The required data is eight chosen ciphertexts, 

and the required memory is 2 111 words. Therefore, when the key size is more 
than 118 bits long, our attack works faster than the brute force attack. 

5 Key Recovery Attack on Feistel-3 

This section presents generic key recovery attacks on Feistel-3 ciphers. Feistel-3 
ciphers are the Feistel-2 ciphers whose F-functions are restricted to be SP-type F- 
functions, which consist of an S-box layer followed by a linear matrix operation. 
This allows an attacker to exploit a linearity of a matrix computation, and thus 
the number of attacked rounds can be increased. To begin with, we review two 
techniques which exploit a linearity of a matrix computation. We refer those 
two techniques as matching without matrix and matrix separation to make our 
explanation simple. However, those techniques have already been introduced, 
for example, in [29132] . Combining them with a (multi-) collision technique and 
function reduction , 7, 9 and 11 rounds attacks on Feistel-3[n], -3[|n] and -3[2n] 
are demonstrated, respectively. Furthermore, as an application of our approach 
on Feistel-3, we show several key recovery attacks on the reduced Camellia [5]. 
Since Camellia is a Feistel cipher with SP-type F-functions, our attack on Feistel- 
3 can be directly applied to it even if its key scheduling function is ideal. Besides, 
the number of attacked rounds by our attack is further increased by one round 
for Camellia due to its non-MDS matrix. Consequently, we present generic key 
recovery attacks requiring extremely low data on the 8-, 10-, 12-round reduced 
Camellia without FL/FL _1 functions and key whitenings. 
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Fig. 8. Matrix Separation 


5.1 Matching without Matrix m 

Let us consider three consecutive rounds of the Feistel-3 cipher whose input and 
output are represented as (Lj|i?j) and (L i+:i \Ri +3 ) as shown in Fig. [3 Assuming 
that an attacker knows those input and output variables, the following equation 
holds: 

Fi(Li © Ki) ®Ri = F i+2 (R i+ 3 0 K i+ 2 ) 0 L i+3 . (2) 

In order to check if the equation holds, we need to guess 2 ■ n/2 bits subkeys 
Ki and K 1+2 , while K,+i is not needed to be guessed. However, if F-functions 
are SP-type F-functions (i.e. , F t = Mi o Si, where Mj and S{ denote an m x to 
matrix and an S-box layer consisting of m Gbit S-boxes, respectively), the size 
of guessing subkey bits can be reduced by exploiting the linearity of the matrix 
operation. Since Mj is a linear function, Eq.@ is redescribed as: 

Mi{Si{Li © Ki)) ®Ri = M i+2 {S i+2 (R i+ 3 0 K i+2 )) 0 L i+3 , 

Mi(Si(Li 0 Ki) 0 Mr\Ri)) = Mi +2 {S i+2 {R i+3 0 K i+2 ) 0 Mr + \(L i+3 )). 

When Mj = Mj+ 2 , we have 

Si{Li 0 Ki) 0 Mr\Ri) = S i+2 (Ri +3 0 K i+2 ) 0 Mr + \(L i+3 ). (3) 

Unlike Eq. (J2J) , we can separately check if Eq.@ holds by the size of the S- 
box l. Therefore, this technique enables us to reduce the number of subkey 
bits to be guessed for the 3-round matching from 2 2 '"/ 2 to 2 2 ^. When Mj ^ 
Mj+ 2 , the matching technique called matching through matrix presented in [31] 
is utilized. In this case, more than m ■ i bits subkeys are required to be guessed. 
For simplicity, from now on, we assume that Mj = M i+2 . 

In the function reduction, the modified subkey K[ affects L 2t and R 2t ~i (t = 
1, 2, ...). Also, in the matching without matrix, we utilize the relation of L%+\ as 
the matching state. This implies that if (i + 1) is even (i.e., (i + 1) = 2 1), Li+% 
is affected by K[ and it cannot be used as the matching state. Therefore, if the 
matching without matrix is used with the function reduction, the starting round 
of the matching i must be even (i.e., (* + 1) must be odd). 
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5.2 Matrix Separation |29j 

In general, for the function reduction technique, all inputs of an F-function 
are needed to be fixed. However, in the Feistel-3 ciphers, the (partial) function 
reduction is constructed by fixing only a part of inputs due to the linearity of the 
matrix. This technique referred as matrix separation in this paper gives more 
degrees of freedom to the inputs. 

Since Mj is a linear operation, each operation can be divided by £ bits. For 
instance, we show the case of m = 4 as an example (see Fig. |5]). Suppose that 
Ki = {Ki,i\Ki, 2 \Ki, 3 \K iA ), Kij 6 {0,1}* and U = {L iA \L^ 2 \L it 3 \L iA ), L hj e 
{0, 1 y. If three input words Ljp, L i>2 and L i>3 are fixed, only 3/4 x n/2 bits of 
Ki are linearly inserted into the ( i + l)-round by regarding T = © 

Ki,i)\{Li,2®K it 2)\(L it 3 (BKi t 3))\0 e ) as new subkey bits, where S’ consists of three 
S-boxes and 0 e denotes £ bits of 0. Note that T is an (n/2)-bit data, however, 
it is determined by (3/4 • n/2) bits subkeys K it i, K h2 and K i)3 . Since L l4 is 
not fixed, M( 0 3 / 4 " n / 2 \s(L i)4 , ® K it 4 )) is non-linear ly inserted into the ( i + l)-th 
round. 


5.3 Key Recovery Attack on 7-Round Feistel-3[n] 

For the 7-round Feistel-3 [n], it seems that the function reduction is applied to 
both directions and the matching without matrix is used in the rounds 3 to 5. 
However, this approach does not work due to the restriction of the combina- 
tion of the matching without matrix and the function reduction. To overcome 
this problem, we utilize the partial function reduction in conjunction with the 
matching without matrix. 

At first, Li is fixed as CONi, and K[ = F(K\ © CON\) is moved to R 3 by 
converting K 2 and K 4 into K 2 ®K[ and K±®K[, respectively. In addition, R\l, 
which is the left half of R\ ( n/4 bits), is also fixed as an n/4-bit constant CONl- 
Using the matrix separation technique, the partial function reduction technique 
is applicable to the left half of K 2 represented as K' 2L . Specifically, let an n/2-bit 
variable K 2 be K 2 = M(S'(K 2L © CON L )\ 0 n / 4 ), where S' consisting of m/2 
S-boxes and O'"'/ 4 denotes n/4 bits of 0. Since K 2 is linearly inserted in round 2 
by the matrix separation, it is possible to move to Lj (see Fig. 03- (a) ) . 

The matching without matrix technique is applied to the three consecutive 
rounds from rounds 4 to 6. In the forward and backward computations, (L4I-R4) 
and (T7IR7) are computable from (K 2R ,K' 3 ) and (K 2L , K' r ), respectively. Then, 
if £ bits of K’ 4 and Kq are guessed, an £-bit matching is feasible, i.e., /C(i) £ 
{K 2R , K' 3 , K 4 a } and 1C( 2 ) G {K^, K' 7 , K 6tU }, where (1 < a < m), and K 4 a and 
Ag.a denote arbitrary £ bits data of K 4 and Kq. respectively. 

Since |/C(i)| = |/C( 2 )| = 3/2 ■ n/2 + £ and the matching size is £ bits, Ni = 
(3n/2 + 2£)/£ plaintext /ciphertext pairs are required to determine /C(i) and 
/C( 2). The complexity of determining /C( 1) and /C( 2 ) is estimated as C cornp = 
max(2 3 ”/ 4+ ^, 2 3n / 4 +^) x Ni. After that, we are able to determine the other bits 
for finding all subkey bits by using a simple MITM attack on the remaining K 4 
and Kq, and K[ and K 3 , respectively. 
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K[ = F(K! ® OON ± ) 

k!> =k' 2 L \k’ 2R 

K 4 = K 4 ® K 1 

K% = © OOJV i )|0«/' 4 

K 3 = *3 ® 

Jf ' = Jf 5 0 if " 


if' = F(if! ® OOWj) 



Fig. 9. Key Recovery Attacks on Feistel-3 Ciphers 

Therefore, the whole time complexity is estimated as 2 3n / 4+ ^ x Ni. Due to 
k = n, the required complexity 2 3k k l+f: ■ Ni is less than 2 fc . The required data is 
Ni = (3n/2 + 21) /£ chosen plaintexts, and the memory is 2 3n / 4+£ • Ni words. 


5.4 Key Recovery Attack on 11-Round Feistel-3 [2n] 

Similarly to the attack on the 7-round Feistel-3 [n], chosen plaintexts in the form 
of P = (Li\Ri L \Ri R ) = ( CON\CON L \R 1R ) are used. Then two more rounds can 
be added to both forward and backward directions due to increasing the master 
key size. Thus, an 11-round attack is constructed. For the detailed parameters, 
see Tabled] and Fig. EE 


5.5 Key Recovery Attack on 9-Round Feistel-3[|n] 

As shown in Fig. d]-(b), for the 9-round Feistel-3 [|n], the function reduction is 
applied to both directions combined with the matching without matrix to the 
rounds 4 to 6, since the middle of the matching is odd indexed round. Thus, a 
key recovery attack is constructed in a straightforward way, unlike the attacks 
on Feistel-3 [n] and -.3 [2rt] . 

In this attack, since = |/C( 2 )| = 2n/2 +£ and the matching size is l 

bits from Fig.jSKb), 7V 2 = (2n + 2t)/l plaintext/ciphertext pairs are required to 
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determine /C(i) and /C( 2) . Such pairs in the form of L\ = CON\ and Rio = CON 2 
are found from N 2 W 1 * 9 ■ (2 ri / 2 ) JV2_1 / w 2 chosen plaintext/ciphertext pairs. Note 
that, if the number of required chosen plaintext/ciphertext pairs, which depends 
on the parameter n and £. is more than n/ 2, the partial function reduction 
technique can be applied to Li and J?io- Otherwise, another attack approach 
is required for this variant. Moreover, if the function reduction is used only in 
the forward direction, an 8-round attack with extremely low data complexity is 
derived (see Table [2]). 


5.6 Application to Reduced Camellia 

In order to demonstrate the usefulness and versatility of our approach on Feistel- 
3, we apply our attack to the reduced version of Camellia block cipher [5] , which 
is Camellia without FL/FL~ X functions and key whitenings. Camellia is a Feis- 
tel block cipher whose F-function is the SP-type F-function consisting of eight 
8-bit S-boxes followed by an 8 x 8 matrix operation. Thus, our attacks on the 
Feistel-3 cipher presented in the previous section are directly applicable to the 
7/9/11-round reduced Camellia-128/192/256. Note that since our attack does 
not depend on the key scheduling function, the attack works on any key schedul- 
ing function even ideal. Furthermore, by exploiting the low diffusion property 
on the matrix used in Camellia, we develop the advanced five round matching 
technique. Then we present low-data complexity attacks requiring less than 60 
plaintext/ciphertext pairs on the 8/10/12-round reduced Camellia-128/192/256 
without FL/FL~ l and key whitenings. 

Five Round Matching for Non-MDS Matrix. Let us consider five consec- 
utive rounds of the Camellia whose input and output are represented as (Lj|i£j) 
and (Lj_|_5 |_Rj_|_5), respectively. By using the three-round matching without ma- 
trix technique in the middle, the following equation holds. 

S(L i+ 1 © K i+ i) © = S{R i+ 4 0 K i+3 ) © M~ 1 (R i+5 ). 

Since the S-box layer consists of eight 8-bit S-boxes, by guessing two bytes of 
subkeys with the same byte position K i+ ij and Ki +3 j, the 8-bit matching is 
possible if the same indexed 8 bits data Li+i . 3 and R-i+i.j are also known. Since 
L i+ 1 = M{S{L i ®K i ))®R i and R i+4 = M(s\L i+ 5 ®K i + 4 ))®R i+5 , all bits of K t 
and K i+4 are required to be guessed to obtain any byte of L i+3 and R.,+4 if the 
underlying matrix M is optimal (i.e., MDS matrix). However, for Camellia, the 
8 bits data L i+ i t j and Ri+i.j are derived by guessing corresponding 40(= 8x5) 
bits of Ki and K i+4 when (5 < j < 8), since Camellia utilizes non-MDS matrix 
(See [5] for the details of the matrix used in Camellia). For example, Li + 1,5 
and -Rj+4,5 are derived from K itP (p e {1, 2, 6, 7, 8}) and K i+4tQ (q e {1, 2, 6, 7, 8}, 
respectively. Therefore, the number of key bits to be guessed for the 5-round 
matching in each direction is reduced from 128 bits (= 64 X 2) to 48 bits (= 8 
+ 40). 
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(c) 12-round Attack on Camellia-256 


Fig. 10. Key Recovery Attacks on Reduced Camellia-128/192/256 

Key Recovery Attack on 8-Round Reduced Camellia- 128. Let us con- 
sider the 8-round reduced Camellia-128. In order to use the function reduction 
technique in the forward process, we collect chosen plaintexts in the form of 
L% = CONi- 

The five round matching for non-MDS matrix technique is used from rounds 
3 to 7. In the forward and backward computations, (Lg\Rg) and (L H \B,g) are 
computable by using K' 2 (= K 2 © K[) and Kg, respectively. Then, for the 8-bit 
matching, 8 bits subkey K' i a and the corresponding 40 bits of subkey Kg in the 
forward computation are required to be guessed, where K' 4 = K 4 ® K[. Similarly, 
we need to guess 8 bits subkey Kg t a and the corresponding 40 bits of subkey Ky 
in the backward computation. In other words, /C(i) £ {K 2 , K 4 a , 40 bits of Kg] 
and /C( 2 ) £ {Kg, K e>a , 40 bits of K 7 ], where 5 < a < 8. 

Since = |/C( 2 )| = 112 and the matching size is 8 bits, 28(= (112+112)/8) 
plaintext /ciphertexts are sufficient to determine 1C (i) and 1C(2)- The complexity 
of determining /C(i) and /C( 2 ) is estimated as C cornp = max(2 112 , 2 112 ) x 28 « 2 117 . 
After that, we are able to determine the other bits for finding all subkeys by using 
the simple MITM attack on the remaining 24 bits of Kg and Ky. and 56 bits of 
K 4 and Kg in the forward and backward computations, respectively. Therefore, 
the whole complexity is estimated as 2 117 (sa 2 117 + 2 80 ). The required memory 
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Table 3. Comparisons of Key Recovery Attacks on Reduced Camellia-128/192/256 
without FL/FL,- 1 Functions and Key Whitenings 


Target 

# Attacked Rounds 

Attack Type 

Time 

Memory 

Data 

Reference 

Camellia-128 

12 

Impossible Differential 

2 116.6 

not given 

2 116.3 

m 

8 

Meet-in-the-Middle 

2 117 

2 117 

28 

Sect. [ST 

Camellia-192 

14 

Impossible Differential 

2182.2 

not given 

2 H7 

[27] 

10 

Meet-in-the-Middle 

2 190 

2 174 

44 

Sect. [576 

Camellia-256 

16 

Impossible Differential 

2 249 

not given 

2 123 

m 

12 

Meet-in-the-Middle 

2 246 

2 246 

60 

Sect. [ST 


is 2 117 words, and the required data is only 28 chosen plaintext/ciphertext pairs 
(see Fig. HUldall. 


Key Recovery Attack on 12-Round Reduced Camellia-256. Similarly 
to the attack on the reduced Camellia-128, for the reduced Camellia-256, the 
five round matching for non-MDS matrix technique is used. Since two more 
rounds can be appended to each direction, a 12-round attack is constructed (see 
Fig- QU (c) and Table [3J) . 


Key Recovery Attack on 10-Round Reduced Camellia-192. In this at- 
tack, in order to utilize the function reduction technique in conjunction with 
the matrix separation technique, we collect chosen plaintexts in the form of 
Li = CONi and R\,i-j = CON L , where R.1,1-7 denotes the left 56 bits of Ri 
and CONl is a 56-bit constant. Then K[ = F(CON\ ® K\) is moved to R7 by 
redefining K' p = K p (B K[(p = 2, 4, 6). In addition, the left 56 bits of K 2 defined 
as K 21 _ 7 is also moved to Rio by using the partial function reduction technique. 
Namely, we assume that K 2 = M(S'(K 21 _ 7 ® CO1Vl)|0 8 ) is linearly inserted 
in round 2 and the remaining 8-bit subkey K 2 g is non-linearly inserted in round 
2, where S' consists of seven 8-bit S-boxes. 

The five round matching for non-MDS matrix technique is used from rounds 
5 to 9. Here, (T5I-R5) and (Tio,Rio) are computed from {K' 28 ,K' 3 {= A3© 
K 2 ),K' a {= Ki ® K[ l )) and (K 21 _ 7 ,Kq), respectively. For the 8-bit match- 
ing, K' 6 8 and the corresponding 40 bits of K' 5 (= A5 ® A") are required to be 
guessed in the forward computation, where K' 6 = A 6 ® K[ . Similarly, in the 
backward computation, A 8;8 and the corresponding 40 bits of K' g (= Kg ® K![) 
are required to be guessed. Namely, K . (!) G {K 2 8 ,K 3 ,K' A ,K' G 8 , 40 bits of Kl\ 
and /C( 2) G {K 2 . 1 - 7 ) Ag, A 8;8 , 40 bits of Ag}- In this attack, the whole complex- 
ity to determine all subkey bits is estimated as 2 190 (« 2 190 + 2 80 ). The required 
memory is 2 174 (« 2 168 x 44) words, and the required data is only 44 chosen 
plaintext/ciphertext pairs (see Fig.[TD]-(b) and Table [3]). 
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6 Discussion 

In order to compare the numbers of attacked rounds by our attacks with the 
previous results, we consider key recovery attacks from a 5-round impossible 
differential distinguisher or a 5-round zero-correlation linear distinguisher on 
the Feistel ciphers employing bijective F-functions |6I11| . Note that those distin- 
guishers depend only on the structure of the cipher unlike the other distinguishers 
such as a differential and a linear distinguisher. When k = n, guessing n/2 bits 
subkey involved in the 6-th round, it is possible to construct a 6-round key recov- 
ery attack from the 5-round distinguishers. Similarly, for k = 3n/2 and k = 2n, 
a 7 and an 8-round key recovery attacks are constructed by additionally guess- 
ing n/2 and n bits subkeys, respectively. Compared to those results, our attacks 
are the best attacks with respect to the number of attacked rounds for Feistel- 
2[2n], -3[n], -3[|n] and -3[2n] as described in Table Q] Also, for Feistel-l[2n] 
and Feistel-2[|n], the same numbers of rounds are attacked by our approach. 
Especially, the attack on the 11-round Feistel-3[2n] greatly exceeds the number 
of attacked rounds given by the distinguisher based attacks. More importantly, 
Feistel-3[2n] structure is well used in concrete block ciphers such as a 128-bit 
block cipher taking a 256-bit key, e.g., Camellia-256. 

In addition, thanks to the MITM approach, most of our attacks require an 
extremely small data complexity, in contrast to the classical statistical attacks 
such as the impossible differential and zero correlation linear attacks that gener- 
ally require huge amount of data. This implies that our attacks may work even 
if the number of queries to the encryption oracle is restricted. In fact, the similar 
approach, which is the low-data complexity attacks on AES, has already been 
studied in mm- Thus, our work is also regarded as the first evaluation results 
on the low-data complexity attacks on the Feistel schemes. 

7 Conclusion 

This paper has shown the improved generic key recovery attacks on Feistel 
schemes independent of the key scheduling function. The proposed approach 
is based on the all subkeys recovery attack. With several advanced techniques 
such as function reduction and key linearization, which basically reduce the num- 
ber of involved subkey bits, we presented several new key recovery attacks on 
the Feistel schemes. 

To demonstrate the usefulness and the versatility of our approach, we showed 
several attacks on the concrete block ciphers including CAST-128 and Camellia. 
Among them, we would like to stress that the presented attack on the 8-round 
reduced CAST-128 having more than 118 bits key is the best attack with respect 
to the number of attacked rounds. Since our approach is generic, it is expected 
to be applied to other Feistel- type block ciphers. We believe that our results are 
useful not only for a deeper understanding the security of the Feistel schemes, 
but also for designing an efficient block cipher such as a low-latency cipher. 
Moreover, we expect that our attacks could be improved by combining with the 
recent attack called sieve-in-the-middle attack HZ]. 


484 T. Isobe and K. Shibutani 


References 

1. Adams, C.: The CAST-128 encryption algorithm. RFC-2144 (May 1997) 

2. Adams, C.: Constructing symmetric ciphers using the CAST design procedure. 
Des. Codes Cryptography 12(3), 283-316 (1997) 

3. Aoki, K., Sasaki, Y.: Preimage attacks on one-block MD4, 63-step MD5 and more. 
In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 
103-119. Springer, Heidelberg (2009) 

4. Aoki, K., Guo, J., Matusiewicz, K., Sasaki, Y., Wang, L.: Preimages for step- 
reduced SHA-2. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 
578-597. Springer, Heidelberg (2009) 

5. Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, 
T.: Camellia: A 128-bit block cipher suitable for multiple platforms - design and 
analysis. In: Stinson, D.R., Tavares, S. (eds.) SAC 2000. LNCS, vol. 2012, pp. 
39-56. Springer, Heidelberg (2001) 

6. Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of Skipjack reduced to 31 
rounds using impossible differentials. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, 
vol. 1592, pp. 12-23. Springer, Heidelberg (1999) 

7. Biham, E., Shamir, A.: Differential cryptanalysis of Snefru, Khafre, REDOC-II, 
LOKI and Lucifer. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 
156-171. Springer, Heidelberg (1992) 

8. Biryukov, A., Nikolic, I.: Complementing Feistel ciphers. In: FSE 2013. LNCS. 
Springer (2013) 

9. Bogdanov, A. A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, 
M., Seurin, Y., Vikkelsoe, C.: PRESENT: An ultra-lightweight block cipher. In: 
Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 450-466. 
Springer, Heidelberg (2007) 

10. Bogdanov, A., Rechberger, C.: A 3-Subset meet-in-the-middle attack: Cryptanaly- 
sis of the lightweight block cipher KTANTAN. In: Biryukov, A., Gong, G., Stinson, 
D.R. (eds.) SAC 2010. LNCS, vol. 6544, pp. 229-240. Springer, Heidelberg (2011) 

11. Bogdanov, A., Rijmen, V.: Linear hulls with correlation zero and linear cryptanal- 
ysis of block ciphers. IACR Cryptology ePrint Archive, vol. 2011, p. 123 (2011) 

12. Borghoff, J., Canteaut, A., Giineysu, T., Kavun, E.B., Knezevic, M., Knudsen, 
L.R., Leander, G., Nikov, V., Paar, C., Rechberger, C., Rombouts, P., Thomsen, 
S.S., Yalgm, T.: PRINCE - A low-latency block cipher for pervasive computing 
applications - extended abstract. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. 
LNCS, vol. 7658, pp. 208-225. Springer, Heidelberg (2012) 

13. Bouillaguet, C., Derbez, P., Dunkelman, O., Keller, N., Rijmen, V., Fouque, P.-A.: 
Low data complexity attacks on AES. IEEE Transactions on Information The- 
ory 58(11), 7002-7017 (2012) 

14. Bouillaguet, C., Derbez, P., Fouque, P.-A.: Automatic search of attacks on round- 
reduced AES and applications. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, 
vol. 6841, pp. 169-187. Springer, Heidelberg (2011) 

15. Bouillaguet, C., Dunkelman, O., Leurent, G., Fouque, P.-A.: Another look at com- 
plementation properties. In: Hong, S., Iwata, T. (eds.) FSE 2010. LNCS, vol. 6147, 
pp. 347-364. Springer, Heidelberg (2010) 

16. De Canniere, C., Dunkelman, O., Knezevic, M.: KATAN and KTANTAN — A 
family of small and efficient hardware-oriented block ciphers. In: Clavier, C., Gaj, 
K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 272-288. Springer, Heidelberg (2009) 

17. Canteaut, A., Naya-Plasencia, M., Vayssiere, B.: Sieve-in-the-middle: Improved 
MITM attacks. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, 
vol. 8042, pp. 222-240. Springer, Heidelberg (2013) 



Generic Key Recovery Attack on Feistel Scheme 485 


18. Dinur, I., Dunkelman, O., Shamir, A.: Improved attacks on full GOST. In: Canteaut, 
A. (ed.) FSE 2012. LNCS, vol. 7549, pp. 9-28. Springer, Heidelberg (2012) 

19. FIPS, Data Encryption Standard. Federal Information Processing Standards Pub- 
lication 46 

20. Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.: The LED block cipher. In: 
Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 326-341. Springer, 
Heidelberg (2011) 

21. Isobe, T.: A single-key attack on the full GOST block cipher. J. Cryptology 26(1), 
172-189 (2013) 

22. Isobe, T., Shibutani, K.: All subkeys recovery attack on block ciphers: Extending 
meet-in-the-middle approach. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, 
vol. 7707, pp. 202-221. Springer, Heidelberg (2013) 

23. Jean, J., Nikolic, I., Peyrin, T., Wang, L., Wu, S.: Security analysis of PRINCE. 
In: Pre-proceeding of FSE 2013. LNCS. Springer (2013) 

24. Knezevic, M., Nikov, V., Rombouts, P.: Low-latency encryption - is “Lightweight 
= light + wait”? In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol. 7428, 
pp. 426-446. Springer, Heidelberg (2012) 

25. Knudsen, L.R.: DEAL - a 128-bit block cipher. Technical Report 151, University 
of Bergen, Department of Informatics, Norway (February 1998) 

26. Knudsen, L.R., Rijmen, V.: Known-key distinguishers for some block ciphers. In: 
Kurosawa, K. (ed.) ASIACRYPT 2007. LNCS, vol. 4833, pp. 315-324. Springer, 
Heidelberg (2007) 

27. Lu, J., Wei, Y., Kim, J., Fouque, P.-A.: Cryptanalysis of reduced versions of the 
Camellia block cipher. In: Pre-Proceedings of SAC 2011 (2011) 

28. Mala, H., Shakiba, M., Dakhilalian, M., Bagherikaram, G.: New results on im- 
possible differential cryptanalysis of reduced-round Camellia-128. In: Jacobson Jr., 
M.J., Rijmen, V., Safavi-Naini, R. (eds.) SAC 2009. LNCS, vol. 5867, pp. 281-294. 
Springer, Heidelberg (2009) 

29. Ohtahara, C., Okada, K., Sasaki, Y., Shimoyama, T.: Preimage attacks on full- 
ARIRANG: Analysis of DM-mode with middle feed-forward. In: Jung, S., Yung, 
M. (eds.) WISA 2011. LNCS, vol. 7115, pp. 40-54. Springer, Heidelberg (2012) 

30. Patarin, J.: Security of random Feistel schemes with 5 or more rounds. In: Franklin, 
M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 106-122. Springer, Heidelberg (2004) 

31. Sasaki, Y.: Meet-in-the-middle preimage attacks on AES hashing modes and an 
application to Whirlpool. In: Joux, A. (ed.) FSE 2011. LNCS, vol. 6733, pp. 378- 
396. Springer, Heidelberg (2011) 

32. Sasaki, Y.: Preimage attacks on Feistel-SP functions: Impact of omitting the last 
network twist. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) 
ACNS 2013. LNCS, vol. 7954, pp. 170-185. Springer, Heidelberg (2013) 

33. Sasaki, Y., Yasuda, K.: Known-key distinguishers on 11-round Feistel and collision 
attacks on its hashing modes. In: Joux, A. (ed.) FSE 2011. LNCS, vol. 6733, pp. 
397-415. Springer, Heidelberg (2011) 

34. Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: 
An ultra-lightweight blockcipher. In: Preneel, B., Takagi, T. (eds.) CHES 2011. 
LNCS, vol. 6917, pp. 342-357. Springer, Heidelberg (2011) 

35. Soleimany, H., Blondeau, C., Yu, X., Wu, W., Nyberg, K., Zhang, H., Zhang, L., 
Wang, Y.: Reflection cryptanalysis of PRINCE-like ciphers. In: Pre-proceeding of 
FSE 2013. LNCS. Springer (2013) 

36. Suzuki, K., Tonien, D., Kurosawa, K., Toyota, K.: Birthday paradox for multi- 
collisions. In: Rhee, M.S., Lee, B. (eds.) ICISC 2006. LNCS, vol. 4296, pp. 29-40. 
Springer, Heidelberg (2006) 



Does My Device Leak Information? An a priori 
Statistical Power Analysis of Leakage Detection 
Tests 


Luke Mather, Elisabeth Oswald, Joe Bandenburg, and Marcin Wojcik 


University of Bristol, Department of Computer Science, 
Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK 
{Luke . Mather , Elisabeth . Oswald , Marcin . Wo j cik}@br is .ac.uk, 
j oeObandenburg . com 


Abstract. The development of a leakage detection testing methodology 
for the side-channel resistance of cryptographic devices is an issue that 
has received recent focus from standardisation bodies such as NIST. Sta- 
tistical techniques such as hypothesis and significance testing appear to 
be ideally suited for this purpose. In this work we evaluate the candi- 
dacy of three such detection tests: a t-test proposed by Cryptography 
Research Inc., and two mutual information-based tests, one in which data 
is treated as continuous and one as discrete. Our evaluation investigates 
three particular areas: statistical power, the effectiveness of multiplicity 
corrections, and computational complexity. To facilitate a fair compar- 
ison we conduct a novel a priori statistical power analysis of the three 
tests in the context of side-channel analysis, finding surprisingly that the 
continuous mutual information and /.-tests exhibit similar levels of power. 

We also show how the inherently parallel nature of the continuous mu- 
tual information test can be leveraged to reduce a large computational 
cost to insignificant levels. To complement the a priori statistical power 
analysis we include two real-world case studies of the tests applied to 
software and hardware implementations of the AES. 

1 Introduction 

The evaluation of the resilience of cryptographic devices against side-channel 
adversaries is an issue of increasing importance. The potential of side-channel 
analysis (SCA) as an attack vector is driving the need for standards organisations 
and governing bodies to establish an acceptance-testing methodology capable of 
robustly assessing the vulnerability of devices; the National Institute of Stan- 
dards and Technology (NIST) held a workshop in 2011 driving the requirements 
|U and recent papers have been published on this topic by industry USES]. 

Current evaluation methodologies such as Common Criteria [2], used by bod- 
ies such as ANSSI El and BSI [3], consist of executing a battery of known side- 
channel attacks on a device and considering whether the attack succeeds and, if 
so, the quantity of resources expended by an adversary to break the device. This 
methodology is likely to prove unsustainable in the long-term: the number and 
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type of Simple Power Analysis (SPA), and particularly Differential Power Anal- 
ysis (DPA) attacks is steadily increasing year-on-year, lengthening the testing 
process and forcing evaluation bodies to keep up-to-date with an increasingly 
large, technically complex and diverse number of researched strategies. 

A desirable complement or alternative to an attack-focused evaluation strat- 
egy is to take a ‘black-box’ approach; rather than attempting to assess security 
by trying to find the data or computational complexity of an optimal adversary 
against a specific device, we can attempt to quantify whether any side-channel 
information is contained in power consumption data about underlying secrets 
without having to precisely characterise and exploit leakage distributions. We 
describe this as a detection strategy; the question any detection test answers is 
whether any side-channel information is present, and not to precisely quantify 
the exact amount or how much of it is exploitable. Detection-based strategies 
can be used to support ‘pass or fail’ type decisions about the security of a device 
m, or can be used to identify time points that warrant further investigation. 

In practice we estimate information leakage, and so any reasonable detection 
strategy should ideally incorporate a degree of statistical rigour. In this paper we 
provide a comprehensive evaluation of three leakage detection hypothesis tests 
in the context of power analysis attacks: a f-test proposed by m , and two tests 
for detecting the presence of zero mutual information (MI) — one in which power 
traces are treated as continuous data (hereafter the CMI test) [TO], and one as 
discrete (hereafter the DMI test) |S] . 

Our contribution. Previous work in the context of side-channel analysis has as- 
sessed detection tests through practical experimentation only m- This approach 
creates flawed comparisons of tests for reasons similar to those encountered in 
the practical analysis of distinguishers in DPA [55]; the effects of sample size and 
estimation error on detection test performance cannot be quantified in a prac- 
tical experiment and consequently it becomes difficult to draw fair comparisons 
that apply in a general context. To ensure a fair comparison in this work we 
perform an a priori statistical power analyst] of the three detection tests using 
a variety of practically relevant side-channel analysis scenarios. The analysis al- 
lows us to study the effects that sample size, leakage functions, noise and other 
hypothesis testing criteria have on the performance of the detection tests in a fair 
manner. In addition to statistical power, we also investigate the computational 
complexity of the tests and the effectiveness of multiplicity corrections. 

Related work. An alternative to the black-box strategy is the ‘white-box’ leakage 
evaluation methodology proposed by Standaert et al. [55] . Their methodology re- 
quires an estimation of the conditional entropy of a device’s leakage distribution 
using an estimated leakage model. This allows for a tighter bound on the amount 

1 The overlap in terminology of the statistical power analysis of hypothesis tests with 
the entirely different differential or simple power analysis technique is unfortunate. 
To establish a reasonable separation of terminology we will use ‘DPA’ or ‘SPA’ to 
address the latter technique, and ‘statistical power’ when referencing the former 
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of information available to an adversary, but requires additional computational 
expense and the ability to profile a device, and bounding estimation error in the 
results is non-trivial. The black-box detection approach outlined in this work 
does not require any device profiling, trading-off the ability to estimate the ex- 
ploitable information leakage contained within the device for efficiency gains and 
the ability to increase robustness through statistical hypothesis testing. A de- 
tection strategy may be used as a complement to the approach of Standaert et 
al. by identifying a subset of time points that are known to leak information and 
can be further explored in a white-box analysis. 

There is no previous a priori power analysis study of these three tests in 
the context of SCA. A generic analysis of the CMI test and additional non- 
parametric hypothesis tests was conducted in [TO], but does not consider the 
influence of variables such as noise and leakage function in the context of side- 
channel analysis, and cannot be used in comparison with the DMI or f- tests. 

Organisation In Section [1] of this work we present the results of the first a priori 
statistical power analysis of the three detection tests in the context of side- 
channel analysis. To support the a priori analysis we also provide a case study 
illustrating an example application of the tests to real-world traces acquired from 
a software and a hardware implementation of the AES in Section [5] Section [5] 
discusses the computational complexity of the three tests. 

2 Introduction to Selected Hypothesis Tests 

2.1 Side-Channel Analysis 

We will consider a ‘standard’ SCA scenario whereby the power consumption T of 
a device is dependent on the value of some internal function fk(x) of plaintexts 
and secret keys evaluated by the device. Using the random variable X £ X to 
represent a plaintext and the random variable K s 1C to represent a sub-key, 
the power consumption T of the device can be modelled using T = Lo fk ( x ) + 
e, where L is a function that describes the data-dependent component of the 
power consumption and e represents the remaining component of the power 
consumption modelled as additive random noise. 

2.2 Candidate Tests 

There are many hypothesis tests that may be used to detect information leakage: 
one can test for differences between particular moments (such as the mean) of 
leakage distributions, or one can test for any general differences between leakage 
distributions. In this work we consider three tests, one from the former category 
and two from the latter. In the former category, the Welch f-test [27| . used to 
assess the difference between the means of two distributions, has been proposed 
by Cryptography Research Inc. M- One can also analyse higher moments using 
tests such as the F-test |2U]. Information leakage solely occurring in a particular 
higher moment is rare — to our knowledge, one example of this is in [2U] — and so 
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a natural progression is to use a generic non-parametric test instead. Chatzikoko- 
lakis et al. and Chothia et al. present hypothesis tests capable of detecting the 
presence of discrete and continuous mutual information |9ll0| . 

Whilst alternative non-parametric tests are available, mutual information- 
based methods provide an intuitive measure and are frequently used in other 
contexts |23I26] , There is a generic a priori power analysis comparing the CMI 
test and additional non-parametric hypothesis tests in [TO], finding that the CMI 
test compared favourably. The analysis does not discuss any of the side-channel 
specific variables described in Section 12.11 and cannot be used in comparison 
with the f-test, but does suggests that an Mi-based test is a natural choice for 
a generic test candidate. As such, we focus on the f-test and the two Mi-based 
methods, and note that our evaluation strategy can be easily applied to other 
detection tests in the future. 

The null hypothesis for any hypothesis testing procedure used in a detection 
context is that there is no information leakage: using the f-test, any statistically 
significant difference of means is evidence for an information leak, and using 
Mi-based tests, any significant non-zero mutual information is evidence. 

The generic strategy followed by each test is to systematically evaluate each in- 
dividual time point in a set of traces in turn. This is a ‘univariate’ approach, and 
in many cases is likely to be sufficient; vulnerabilities arising from sub-optimal 
security measures are likely to manifest themselves as leakage detectable within 
a single time point. To detect leakage exploitable by n-th order attacks would 
necessitate the joint comparison of n time points. This results in a considerable 
increase on the the amount of computation required — the brute force strategy 
would be to analyse the joint distribution of every possible n-tuple of points — 
and additionally can substantially increase the complexity of the test statistics, 
with multivariate mutual information in particular becoming costly. Whilst an 
efficient multivariate strategy would be desirable, it is beyond the scope of this 
initial work. 

2.3 Difference-of-means and the f-test 

Exploiting the difference-of-means Tj — T 2 between two sets of power traces Tj 
and T 2 partitioned on a single bit of a targeted intermediate state was proposed 
by Kocher et al. and is the canonical example of a generic DPA attack [T7]. The 
same difference-of-means can also be used to detect information leakage, and 
was proposed as a candidate detection test in m- 

Welch’s f-test is a hypothesis test that (in the two-tailed case) tests the null 
hypothesis that the population means of two variables are equal, where the 
variables have possibly unequal variances, yielding a p-value that may or may 
not provide sufficient evidence to reject this hypothesis. The test statistic f is: 


f 


Ti-T 2 



( 1 ) 
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where T», sf and N, t are the sample means, sample variances and sample size of 
the i-th set T,. Using this test statistic and the Welch-Satterthwaite equatiord 
to compute the degrees of freedom u, a p- value can be computed to determine 
whether there is sufficient evidence to reject the null hypothesis at a particular 
significance level 1 — a. Using the quantile function for the t distribution at a 
significance level a and with v degrees of freedom, a confidence interval for the 
difference-of-means can also be computed. 

Leveraging the f-test requires a partitioning of the traces based on the value 
of a particular bit of an intermediate state with the targeted algorithm, and 
therefore to comprehensively evaluate a device every single bit of every single 
intermediate state must be tested. To assess the i - th bit of a particular state 
for leakage (e.g. the output of SubBytes in a particular round), an evaluator 
must compute the intermediate values for the chosen state, using a set of chosen 
messages. Having recorded the encryption or decryption of the chosen messages, 
the resulting traces can be partitioned into two sets Tj and X 2 , depending on 
the value of the f-th bit of the intermediate state. The test statistic t and corre- 
sponding p-values or confidence intervals can then be used to determine whether 
a difference between the means exists. 

The f-test by design can only detect differences between subkeys that are 
contained within the mean of the leakage samples, and assumes that the popu- 
lations being compared are normally distributed. In practice univariate leakage 
from unprotected devices is typically close enough to Gaussian for this condition 
to not be too restrictive [718117] . 

2.4 Mutual Information 

Given two random variables X and Y, the MI l(X ; Y) computes the average 
information gained about X if we observe Y (and vice-versa) . The application of 
MI to detecting information leaks from a cryptographic device is straightforward: 
any dependence between subkeys and the power consumed by the device, giving 
I (K; T) > 0, may be evidence for an exploitable information leaf@. 

The rationale for using MI to detect information leaks is that it compares 
distributions in a general way, incorporating all linear and non-linear dependen- 
cies between sub-keys and power values. Unfortunately, the estimation of MI 
is well-known to be a difficult problem. There are no unbiased estimators, and 
it has been proven that there is no estimator that does not perform differently 
depending on the underlying structure of the data m 

Recent results on the behaviour of zero MI can help to alleviate this problem. 
Chatzikokolakis et al. find the sampling distribution of MI between two discrete 
random variables when it is zero, where the distribution of one of the variables 
is known and the other unknown, and use this to construct a confidence interval 

2 Using Welch-Satterthwaite, the degrees of freedom v for a (-distribution can be 

calculated as v = ^ /Nl)2/ { ' 

3 Under the assumption of the ‘equal images under different sub-keys’ property [5T| 

we can safely compute I (X;T), if simpler. 
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test [9]. A second result from Chothia and Guha establishes a rate of convergence, 
under reasonable assumptions, for the sampled estimate for zero MI between one 
discrete random variable with a known distribution and one continuous random 
variable with an unknown distribution |10j . This result is then used to construct a 
non-parametric hypothesis test to assess whether sampled data provides evidence 
of an information leak within a system. 

Discrete mutual information As side-channel measurements are typically sam- 
pled using digital equipment, it may be viable to treat the sampled data as 
discrete. The most common way to make continuous data discrete is to split the 
continuous domain into a finite number of bins. Using the standard formula for 
marginal and conditional entropy, the discrete MI estimate can be computed as 

The test of Chatzikokolakis et al. is biased by (I — 1)(J — l)/2 n, where J and 
J are the sizes of the distribution domains of two random variables in question, 
and n is the number of samples acquired. In our context, I = |/C|, the number of 
possible sub-keys, and J = \T\, the number of possible power values as a result of 
discretisation. Consequently, the point estimate e for MI is the estimated value 
minus this bias: e = I (if; T) — [I — 1 )(J — l)/2n. We can use this to compute 
100(1 — a)% confidence intervals for zero and non-zero MI (fuh details can be 
found in (9j). 

As a result of the bias of the test, to be sure of good results it is necessary 
to ensure that the number of traces sampled is larger than the product of the 
number of sub- keys and the number of possible power values. The applicability 
of this discrete test is then dictated by the ability of an evaluator to sample 
enough traces to meet this condition. 

Continuous mutual information. The test of Chothia and Guha requires two 
assumptions about the data to guarantee a convergence result for zero MI m- 
The first is that the power values are continuous, real- valued random variables 
with finite support. This may or may not hold theoretically, depending on the 
distribution of the leakages, but in practice will be true; the sampling resolution 
used dictates the range of the recorded power consumption. The second is that 
for u = {0, 1}, the probability p(u,t ) must have a continuous bounded second 
derivative in t. This can be fulfilled with the leakage analysis of a single bit of a 
key only. However, Chothia and Guha also demonstrate experimentally that the 
test works well in cases of multiple inputs, often outperforming other two-sample 
tests [TU] . 

Under the assumption of a continuous leakage distribution, we are estimating 
a hybrid version of the MI: 


(3) 
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To compute this estimate we are required to estimate a conditional probability 
density function Pr{f|fc} using kernel density estimation. The assumptions un- 
derlying the test’s convergence result dictate the use of a function such as the 
Epanechnikov kernefl as the chosen kernel function, and a bandwidth function 
such as Silverman’s m general purpose bandwidtqj- 

Using this estimated density function, we can compute an estimate of the 
MI, I (K;T). The following step of the hypothesis test is a permutation stage 
requiring s permutations of the sampled data T': for each sampled power value, 
we randomly assign a new sub- key to the value without replacement. The power 
values contained in each permuted set should now have no relation with the sub- 
keys, and so the MI of the s sets can now be computed . . . , l s (K ; T'), 

providing a baseline for zero MI. 

An estimated p- value can be computed by computing the percentage of the MI 
estimates Ii , . . . , I s that have a value greater than the observed point estimate 
I (K;T). The suggested number of shuffled estimates to achieve useful baseline 
results is given to be 100 by Chothia and Guha, but to increase the power of the 
test and the precision of the estimated p-values a few thousand shuffles may be 
required. 

3 Evaluation Methodology 

3.1 Comparing Detection Tests 

The most important notion in hypothesis testing is of the quantification and classi- 
fication of the error involved. The type I error rate a is defined as the probability of 
incorrectly rejecting a true null hypothesis, usually termed the significance crite- 
rion. Tests are also associated with a type II error rate /?: the probability of failing 
to reject a false null hypothesis. The exact valuation assigned to these error rates 
is an important factor to balance; typically decreasing one error rate will result in 
an increase in the other, and the only way to reduce both in tandem is to increase 
the sample size available to the test. The statistical power of a test is defined as 
the probability of correctly rejecting a false null hypothesis, n = 1 — j3. This is 
the key factor for our detection tests: higher statistical power indicates increased 
robustness and lessens reliance on large sample sizes. 

A common motivation for performing an a priori statistical power analyst 
is to compute or estimate the minimum sample size required to detect an effect 
of a given size, or to determine the minimum effect size a test is likely to de- 
tect when supplied with a particular sample size. The determination of sample 
sizes required to achieve acceptable power has two- fold uses: firstly, data acquisi- 
tion from a cryptographic device is an expensive and time-consuming operation, 
and so tests that are less data-hungry are likely to be preferable, and secondly, 

4 Epanechikov’s kernel function is defined as K{u) = 3/4(1 — u 2 )x{|«|<i}- 

5 h = 1.06 stN -1 / 4 5 6 , where st is the sample standard deviation of T and N is the 

number of sampled traces. 

6 For further discussion of statistical power analysis, see m 
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knowledge of the sample sizes required to detect a particular effect can serve as a 
guideline for evaluators to determine the number of trace acquisitions sufficient 
for detecting an information leak. 

3.2 Multiple Testing 

When considering the results of large numbers of simultaneously-computed hy- 
pothesis tests, we must take into account that the probability a single test falsely 
rejects the null hypothesis will increase in proportion with the number of tests 
computed. A single test computed at significance level a = 0.05 has a 5% chance 
of incorrectly rejecting the null hypothesis; when conducting a large number 
of simultaneous tests the probability of a false positive increases. The intuitive 
solution is to control the overall false rejection rate by selecting a smaller signifi- 
cance level for each test. There are two main classes of procedure: controlling the 
familywise error rate (FWER) and controlling the false discovery rate (FDR). 

Familywise error rate. The FWER is defined as the probability of falsely re- 
jecting one or more true null hypotheses (one or more type I errors) across a 
family of hypothesis tests. The FWER can be controlled, allowing us to bound 
the number of false null hypothesis rejections we are willing to make — in our de- 
vice evaluation context this would allow the evaluator to control the probability 
a device is falsely rejected. FWER controlling procedures are conservative, and 
typically trade-off FWER for increasing type II error. 

False discovery rate. Proposed by Benjamini and Hochberg in 2005, the FDR 
is defined as the expected proportion of false positives (false discoveries) within 
the hypothesis tests that are found to be significant (all discoveries). Procedures 
that control the FDR are typically less stringent than FWER-based methods, 
and have a strong candidacy for situations where test power is important. The 
Benjamini-Hochberg (BH) procedure is a ‘step-up’ method that strongly con- 
trols the FDR at a rate a j5j. Given m simultaneous hypothesis tests, the BH 
procedure sorts the p- values and selects the largest k such that pn < where 
all tests with p-values less than or equal to pu can be rejected. Many additional 
FWER and FDR controlling methods exist, e.g. |T3ITS] , but are beyond the scope 
of this paper. 

A trade-off with multiplicity corrections that control the FWER is that gen- 
erally decreasing the FWER results in an increase in type II error. As a conse- 
quence the FDR approach may be more suitable if an evaluator is particularly 
concerned with ensuring that the type II error rate is kept low — that the statisti- 
cal power remains high. It may also serve a useful purpose by identifying a small 
candidate set of time points that are likely to contain information leakage — the 
evaluator can then perform further analysis on the set of points, for example by 
inspecting the effect sizes reported for each of the points, re-sampling additional 
data and performing new hypothesis tests, or even by trying to attack the points 
using an appropriate method. We demonstrate an example application of the BH 
procedure in Section [SI 
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3.3 Why Perform an a priori Power Analysis? 

Having established the importance of statistical power to our detection tests, 
the motivation for performing an a priori power analysis for our three candidate 
tests is that it is not possible to make generally true inferences based on practical 
experiments alone; given that it is only possible to establish the vulnerability 
of a time point by successfully attacking it, it becomes impossible to establish 
whether a reported rejection of the null hypothesis is a false positive — in other 
words, the type II error rate /3 cannot be estimated — and hence any a posteriori 
(or post-hoc) power analysis is likely to be misleading. 

To be able to perform an a priori statistical power analysis, we need to be 
able to produce or simulate data, ideally with characteristics as close as possible 
to those observed in practice, for which we are sure of the presence of informa- 
tion leakage. The most straightforward way to do this is to simulate trace data 
under the ‘standard’ DPA model commonly used throughout the existing body 
of hterature, detailed in Section 12.11 

4 A priori Power Analysis 

As all of the variables in the standard SCA model outlined in Sect,ion l2Tl have an 
effect on detection test performance, to perform a useful a priori power analysis 
we defined a variety of leakage scenarios that have relevance to practice, and then 
estimated the power n of each of the detection tests under many combinations 
of the different parameters in the SCA model for each scenario. For each leakage 
scenario, power was estimated under varying sample sizes, noise levels and using 
two different significance criteria: a = 0.05 and a = 0.00001. The former provides 
a general indication of test power with a common level of significance, and the 
intention with the latter level of significance is to gain an understanding of how 
much statistical power is degraded by the typical tightening of the significance 
criteria enforced by multiple testing corrections. 

Leakage model. We defined five different practically-relevant leakage models L 
under which to simulate trace data: 

1. Hamming weight — a standard model under which the device leaks the 
Hamming weight of the intermediate state; 

2. Weighted sum — the device leaks an unevenly weighted sum of the bits of 
the intermediate state, where the least significant bit (LSB) dominates with 
a relative weight of 10, as motivated by [5]; 

3. Toggle COUNT — the power consumption of hardware implementations has 
been shown to depend on the number of transitions that occur in the S-Box. 
The model used here is computed from back-annotated netlists as in |19j . 
and creates non-linear leakage distributions; 

4. Zero value — for this model we set the power consumption for every non- 
zero intermediate value to be 1, and for the value zero we set the power 
consumption to be 0; this will typically produce small amounts of information 
leakage and should stress the data efficiency of the tests; 
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5. Variance — the mean of the power consumption does not leak, and the 
variance of the power consumption follows the distribution given in Maghrebi 
et al. [18]. The t - test will not be able to detect any leakage, but the model 
can be used to evaluate the relative performances of the MI tests. 

A statistical power analysis would ideally be performed for each candidate 
target function; given the limited space available we have focused on the AES. 
For this comparison we targeted, without loss of generality, the first byte of the 
key. For each leakage model, we simulated traces under a wide range of signal-to- 
noise ratios (SNRs), ranging from 2 -14 to 2 12 , enabling us to assess the maximum 
amount of noise a test can overcome when provided with a particular sample 
size. 

Estimation process. The estimated power for the test is computed as the fraction 
of times the test correctlj0 rejects the null hypothesis for 1, 000 tests run. For 
the CMI and f-tests we used the significance criterion a to determine rejection 
or acceptance, and for the DMI test we checked whether the corrected estimate 
for the MI was inside the 100(1 — a)% confidence interval for zero MI. 

In the following section we present the results of our a priori statistical power 
analysis on the five leakage models in terms of the number of samples required 
to achieve 80% power for each combination of model, SNR and sample size. We 
performed 1,000 permutations of the simulated traces for each CMI test, and 
used the Epanechnikov kernel with Silverman’s bandwidth for the kernel density 
estimation. To enable a fair comparison between the bit and byte level tests, we 
chose to represent the results for the i-test corresponding to the most leaky bit 
of the state. Graphs illustrating the number of samples required by each test to 
achieve 80% power for each leakage model and SNR are shown in Figure [TJ 

Hamming weight. We can see that the f-test is the most powerful test in 
general, as we would expect given the unbiased estimator for the mean values 
and the Gaussian noise assumption holding true in the model. The CMI test 
requires slightly more samples to achieve the requisite power in the presence of 
high noise, and both tests seem to perform equivalently for mid-range and low 
levels of noise. 

The DMI test appears to be significantly less powerful; this is unsurprising 
given a loss of information from the treatment of continuous data as discrete is 
to be expected, and we also see that the test struggles to cope with high levels 
of noise — the lowest SNR for which we could detect an information leak with 
up to 192,000 samples was 2 -3 . A closer inspection indicates that this is caused 
by the bias correction required; the size of the input space for the AES often 
necessitates a large sample size to minimise the size of the correction to within 
manageable bounds. 

7 Each of these scenarios contain information leakage; even for the extremely low 
SNRs, given sufficiently large data an attacker will eventually be able to exploit the 
leakage, and as a consequence candidate detection tests should, for some level of 
sample size, be able to consistently detect information leakage. 
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Fig. 1 . Number of samples required for the f-test, CMI and DMI tests to achieve 
estimated 80% power for a variety of leakage models and SNRs. 


The stricter significance criterion a = 0.00001 seems to have a small but 
noticeable effect on the test power for the CMI and f-tests. Under the DMI test 
we see little change in behaviour; the dominant factor influencing power is the 
bias correction rather than the precise width of the confidence intervals. 
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Weighted sum. The relative dominance of the LSB in the leakage provides 
an additional advantage for the t-test and we found as expected that the test 
achieved its highest power when evaluating this bit. This results in a relative in- 
crease in overall power compared to the CMI test than we observed in the Ham- 
ming weight scenario and also allows for detection of leakage at lower SNRs. The 
CMI test seems to exhibit performance consistent with that under the Hamming 
weight model, and similarly for the DMI test. The effects of the stricter signifi- 
cance criterion are also similar, with noticeable reductions in power observed for 
each of the tests under the smaller a values save for the DMI test, where again 
the bias correction is the predominant factor. 

Toggle COUNT. An analysis of the underlying true distance of means for the 
Toggle COUNT model indicated that the largest information leakage was con- 
tained within the second-least significant bit, which was also twice the leakage in 
the next most leaky bit. As with the Weighted sum model, the relative domi- 
nance of this bit supplies the t-test with an advantage over the CMI test but in 
this instance the advantage is by a smaller margin. We can also see that the CMI 
test appears to be significantly more robust to the stricter significance criterion, 
outperforming the more sensitive t-test in all of the high noise settings. Here 
we also see the DMI test exhibiting an increased sensitivity to the significance 
criterion. 

Zero value. The size of the information leak present in a noise-free setting 
for the Zero value model is small relative to those in the other models: the 
true MI in a noise-free setting is 0.0369 and the true distance-of-means 0.0078. 
As such it is interesting to note the stronger performance of the CMI test in 
high noise settings relative to that of the t-test observed in these results — the 
additional information on the non-linear dependencies contained in the estimated 
MI values increases the power of the CMI test whereas the quantity of noise has a 
stronger effect on the difference-in-means estimated by the t-test. The low power 
estimates for the DMI test are consistent with the small size of the information 
leak in the model coupled with the loss of information in the conversion process 
of continuous to discrete data. 

Variance. By design the mean of the power consumption for all sub- key values 
is equivalent in the Variance model, and so the t-test cannot be applied. As a 
test for the applicability of the CMI and DMI to situations in which only higher- 
order moments leak, the CMI test appears to be robust, so that small sample 
sizes suffice to achieve the requisite power at medium and low noise levels. The 
true information leakage contained within the variances is strongly affected by 
the amount of noise in the samples, which explains why both tests soon begin 
to struggle as the SNR drops below 2°. 


Conclusion. The t-test was generally shown by the a priori power analysis to 
be the most powerful. This is not unexpected: the sample mean is a consistent, 
unbiased estimator for the population mean and converges quickly to the true 
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value. The performance of the CMI test was close to that of the t-test in all 
scenarios, indicating that it remains a robust, if slightly inferior alternative in 
the majority of settings. The DMI test was expected to be less powerful due to 
the loss of information by the conversion of continuous data to discrete, and this 
was observed in our analysis; the results indicate that the test is a viable choice 
only when supplied with large amounts of trace data and only when the SNR is 
high. 

Of note was the superior performance of the CMI test when detecting the 
small leaks produced by our Zero value model, particularly in high-noise set- 
tings. This suggests that the CMI test may be a better, or safer, choice when 
applied to devices with these sorts of characteristics. The results obtained under 
the Variance model indicate that the CMI test is sufficiently robust to han- 
dle ‘tough’ leakage scenarios in which the leakage is solely contained in higher 
moments of the power consumption distribution. 

5 Case Studies 

The a priori statistical power analysis is the primary method for comparison 
of the detection tests. To complement the analysis, and to further explore the 
effectiveness of multiplicity corrections, in the following section we demonstrate 
the application of the three detection tests to the evaluation of two crypto- 
graphic devices implementing the AES. The first device we analyse is an ARM7 
microcontroller implementing the AES in software, with no countermeasures ap- 
plied. This device would be expected to exhibit significant information leakage 
in Hamming-weight form, and hence is a good opportunity to analyse the ef- 
ficacy of multiple testing correction procedures. The second device analysed is 
a Sasebo-R evaluation board manufactured using a 90nm process implementing 
AES in hardware with a Positive-Prime Reed-Muller (PPRM) based SubBytes 
operation using single-stage AND-XOR logic m- This second case study is in- 
tended to investigate the performance of the detection tests under increasingly 
complex leakage distributions as well as acting as a further test for the multi- 
plicity corrections. 


5.1 ARM7 Microcontroller 

Our data set contained 32, 000 traces from the device and we chose to evaluate 
the first key byte for information leakage. For the t-test we analysed the output 
of the first SubBytes operation. Figure [5] illustrates the estimated MI values 
and f-test statistics produced by the detection tests ran at a significance level 
a = 0.05 for each of the 200, 000 time points in our traces. For the CMI test we 
performed 1, 000 permutations of the traces at each time point, and as we found 
that all 8 of the bits in the intermediate state produced similar information 
leakage we elected to display the results for the LSB. 

At the initial significance level a = 0.05, the CMI test identified 9, 360 time 
points consistent with information leakage, the discrete test 178, and the t-test 
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Fig. 2. Estimated CMI and DMI values and t-test statistics produced using 32,000 
traces during an evaluation of an ARM7 microcontroller implementing a software ver- 
sion of the AES. 


9, 713. These occur across the full range of the traces, and account for around 4.8% 
of the total in the CMI and t-test cases. Using our prior knowledge of the device we 
could ascertain that many of these points are likely to be false positives. 

To gain an indication of how many of these time points actually contain 
exploitable leakage, we conducted a battery of attacks on the output of the Sub- 
Bytes operation on all of the time points using the same set of traces including 
Brier et al.’s correlation (CPA) [7], Gierlichs et al.’s mutual information analysis 
(MIA) [T2, both using a Hamming weight power model, and Kocher et al.’s 
difference of means El- Whilst we have argued that practical results should not 
be used to perform a post hoc power analysis, the results of the DPA attacks can 
be used to quantify under-performances of the three tests — time points that can 
be successfully attacked that are missed by detection tests are indicative of low 
statistical power given the available sample size. In this regard the only notable 
false acceptances of time points occurred under the DMI test, with the CMI 
and i-tests able to spot the vast majority of the vulnerable time points. These 
results appear to be consistent with those observed under the Hamming-weight 
scenario in the statistical a priori power analysis. 
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False discovery rate Applying any correction to the results produced by the 
DMI test is redundant as the ‘raw’ results are already highly unlikely to contain 
falsely rejected null hypotheses. The FDR controlling procedures are likely to 
be the most successful of the multiple testing corrections for our purposes, so 
we applied the Benjamini-Hochberg correction to the results produced by the 
CMI and t-tests, controlling the FDR at the levels 0.05 and 0.5. Using prior 
knowledge of the device and the results of the DPA attacks we would not expect 
to observe any information leaked about the first key byte after time 25,000. 

The effect of increasing the value of the FDR on the type I error can be ob- 
served by the larger number of false positives produced when the FDR is 0.5. 
The t-test appears to react more effectively to the corrective procedure, eliminat- 
ing larger numbers of the false positives previously observed at the time points 
greater than 25,000. An inspection of the p- values reported by the CMI test 
indicates that the number of permutations performed is the proximate cause for 
the under-performance: the 1 , 000 executed do not appear to produce enough 
precision in the estimated p-values to allow the step-up procedure to differen- 
tiate between neighbouring tests. The procedures do not appear to result in a 
significant rise in type II error — the increase is lessened with the looser FDR of 
0.5, but appears to be slight in both cases. As always, increasing the sample size 
available would reduce the size of any increase in type II error. 

5.2 Hardware AES with PPRM SubBytes Implementation 

The dataset contained 79, 360 traces from the device at 5 giga-samples per second 
and we again chose to evaluate the first key byte for information leakage; for the 
t-test we analysed the output of the first SubBytes operation. Figure^] illustrates 
the estimated MI values and t-test statistics produced by the detection tests run 
at a significance level a = 0.05 for each of the 50, 000 time points in our traces. 
The first and last 10, 000 points are not displayed as they do not correspond to 
any part of the full AES operation. For the CMI test we increased the number of 
permutations to 10, 000 per time point in an attempt to gain additional precision 
on the estimated p-values. Information leakage was found to occur to a varying 
degree across all 8 bits of the intermediate state when using the f-test — as such, 
we have elected to superimpose the results for all of the state bits on a single 
graph. The DMI test was not able to identify any information leakage. 

A visual inspection of the results produced by both the CMI test and t-tests 
indicate that there are 10 groups of points within the power traces that contain 
significant amounts of information leakage. As would be expected the shape and 
scale of the leakages differ: the t-test is only assessing the SubBytes operation 
and the leakage of individual bits. We were able to confirm the vulnerability of 
the device by successfully executing a reduced Bayesian template attack on the 
intermediate values of the SubBytes operation at the time points the detection 
tests indicated would be vulnerable. The hardware device exhibits less, but still 
significant leaking behaviour when compared to the ARM7 microcontroller im- 
plementation, as evidenced by the lower mutual information estimates and the 
smaller t- test statistic scores. 


Does My Device Leak Information? An a priori Statistical Power Analysis 


501 



Fig. 3. Plots of the time points consistent with information leakage after applying the 
Benjamini-Hochberg FDR controlling procedure to the results produced by the f-test 
and CMI test. 



Fig. 4. Estimated I(K\T) values produced by the CMI test and /-test statistics pro- 
duced using 79, 360 traces taken from an evaluation of a hardware AES device with 
the SubBytes operation using Positive-Prime-Reed-Muller (PPRM) logic. 
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The performance of the CMI and f-tests appears to be similar. The extra 
definition in the CMI graph is likely due to the f-test assessing leakage from the 
output of the SubBytes operation only. The DMI test could not identify any 
information leakage, indicating that many more samples would be required to 
begin to match the power of the CMI and f-tests. 

False discovery rate. The Benjamini-Hochberg correction was applied to the 
results produced by the CMI and t-tests, this time controlling the FDR at the 
levels 0.05 and 0.005. The previous FDR of 0.5 used in the analysis of the ARM7 
device yielded too many clear false rejections of the null hypothesis, possibly due 
to the smaller number of time points, and as a consequence two stricter criteria 
were used. Figure [5] shows the results of applying the two criteria to the results 
produced by the CMI and t- test. The effectiveness of the multiplicity corrections 
is lessened in the hardware device evaluation. The t-test again reacts better to the 
stricter corrective procedure, eliminating larger numbers of likely false positives. 
Despite the increase of permutations per time point from 1,000 to 10,000 for 
the CMI test, the effectiveness of the multiplicity correction is again dampened 
by the lack of precision available in the estimated p-values. It is likely that a 
different, more complex approach may be required to effectively mitigate the 
multiplicity problem under the CMI test. 

6 Computational Complexity 

If we consider commercial and logistical pressures on the evaluation process then 
we must also include the computational complexity of the detection tests as a 
factor in our evaluation. In this regard, the CMI test is particularly expensive. 
Under reasonable parameters of a data set of 80, 000 traces each consisting of 
50, 000 sampled time points, and where the test computes 1, 000 permuted esti- 
mates of the MI at each time point, a full run of the detection test on a single 
key byte necessitates the evaluation of 50 million continuous MI values. If we 
factor in the cost of finding conditional probability density functions, then we 
may expect to perform in total 2.05 x 10 15 (ss 2 51 ) evaluations of the kernel func- 
tion used in the density estimation, at a total cost of roughly 1.64xl0 16 floating 
point operations. 

This presents a significant obstacle; we estimated that our naive single-CPU 
implementation would take around a month to analyse a device. However the 
problem is ‘embarrassingly parallel’ and we implemented the test in parallel form 
using OpenCL: using two AMD Radeon 7970 GPUs we were able to execute a 
test with the above parameters in approximately 14 hours; a throughput of 300 
GFLOPS. The addition of inexpensive GPUs decreases the running time linearly, 
ensuring that the CMI test, even with large data set parameters, is feasible to 
run. By comparison the DMI and t-tests are efficient; a key byte can be fully 
assessed for leakage in under 30 minutes. 
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Fig. 5. Plots of the time points consistent with information leakage after applying the 
Benjamini-Hochberg FDR controlling procedure at levels 0.05 and 0.005 to the results 
produced by the 1-test and CMI test for the hardware AES implementation. 


7 Conclusion 

Taking the perspective of a ‘black-box’ evaluation, in which the evaluator may have 
little knowledge about the leakage characteristics of the device, it would be desir- 
able to select a leakage detection test that is the most generally applicable and that 
has the best all-round performance. In the majority of our a priori analysis this was, 
by a small margin, the t-test. However we must also take into account the inherent 
limitations in the t- test’s inability to measure leakage in any moment other than 
the mean. If an evaluator wished to gain the most coverage over all possible leakage 
scenarios, then, given the significant under-performance of the discrete version in 
the a priori analysis, the CMI test is the only viable candidate. 

The complexity of the tests is an additional factor to consider. The f-test must 
be re-run for every bit and every intermediate operation within the algorithm 
implemented on the device, whereas the CMI and DMI tests need only to be run 
once per bit or byte of key analysed. At first glance the computational cost of 
the CMI test appears to be prohibitive, but we have demonstrated that using 
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relatively inexpensive GPUs and the inherently parallel nature of the problem, 
the running time can easily and cheaply be reduced to insignificant levels. 

In the absence of any general result that can translate MI, entropy or a dif- 
ference of means into the trace requirements for an adversary, the interpretation 
of the results of any standardised detection test becomes heavily reliant on the 
tools provided by statistics. The large body of work on multiplicity corrections is 
a rich resource to draw upon, and further research in this area may yield useful 
results. In addition, a multivariate detection procedure capable of detecting any 
higher-order information leakage warrants research effort. 
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Abstract. Since the introduction of side channel attacks in the nineties, 
a large amount of work has been devoted to their effectiveness and ef- 
ficiency improvements. On the one side, general results and conclusions 
are drawn in theoretical frameworks, but the latter ones are often set in 
a too ideal context to capture the full complexity of an attack performed 
in real conditions. On the other side, practical improvements are pro- 
posed for specific contexts but the big picture is often put aside, which 
makes them difficult to adapt to different contexts. This paper tries to 
bridge the gap between both worlds. We specifically investigate which 
kind of issues is faced by a security evaluator when performing a state 
of the art attack. This analysis leads us to focus on the very common 
situation where the exact time of the sensitive processing is drown in 
a large number of leakage points. In this context we propose new ideas 
to improve the effectiveness and/or efficiency of the three considered 
attacks. In the particular case of stochastic attacks, we show that the 
existing literature, essentially developed under the assumption that the 
exact sensitive time is known, cannot be directly applied when the latter 
assumption is relaxed. To deal with this issue, we propose an improve- 
ment which makes stochastic attack a real alternative to the classical 
correlation power analysis. Our study is illustrated by various attack 
experiments performed on several copies of three micro-controllers with 
different CMOS technologies (respectively 350, 130 and 90 nanometers). 


1 Introduction 

Since the seminal differential power analysis of Kocher et al. m, various side 
channel Attacks (SCA) have been proposed and improved ( e.g . [B1I51 ITT1IT21I3I] ). 
In order to to compare and classify them, theoretical frameworks have then 
been introduced [ITJ[22l|35l|39]. Their main purpose is to identify the attacks 
similarities and differences, and to exhibit contexts where one is better than 
another. They have laid the foundation stones for a general comparison and 
evaluation framework. In parallel, several practical works have addressed issues 
arising when applying an SCA in the real world (e.g. in an industrial context) 
[2[5l[T6l[24l|37]. Those works essentially attempt to fill the gap between the 
theoretical analysis of the attacks and their application in non-idealized contexts. 
However, whereas the published theoretical analyses usually tend towards generic 
and formal statements (sometimes at the cost of too simple models), many of 
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the practical analyses only focus on a particular attack specificity and often 
put the big picture aside. The latter analyses are indeed usually dedicated to 
one specific attack running against a specific target device, which makes them 
hard to generalize. This paper tries to be at the intersection of both worlds: 
we study practice-driven issues while keeping a generic approach w.r.t. attacks 
mechanisms and targeted platforms. This approach and our final purpose are 
close to those in the works of Standaert et al. and Renauld et al. m- 

The starting observation of our study is that side channel traces are never 
reduced to one point in practice, even when they rely on the manipulation of a 
single variable. In contrary, those traces are often composed of a large number of 
points (typically several thousands). In spite of the evidence of this observation, 
it is rarely taken into account when analysing the effectiveness of a side chan- 
nel attack. Such an analysis is indeed frequently done under the assumption, 
sometimes implicit, that a small number of points of interest (POI) has already 
been extracted from the traces either by pattern matching |21j , or by dimension 
reduction or thanks to a previous successful attack [TUI [211] . However, 

the two first categories of techniques are not yet perfect and, after reduction, 
the traces are often still composed of several points in practice. And, what is 
more important, the risk of information loss during the reduction process leads 
most of attack practitioners to not apply them. The third technique (performing 
a first attack to identify the POI) allows for interesting analyses, but it does not 
correspond to a real attack context. Moreover, the best POI for one attack type 
may not be so good for another one. Eventually, we come to a situation where 
attacks are analysed in a (uni-dimensional) context which does not fit with the 
(multi-dimensional) reality faced by the attack practitioners. 

We argue in this paper that the state-of-the-art uni-dimensional analyses can- 
not be straightforwardly adapted to multi-dimensional contexts, which raises 
new interesting issues. The selection of the most likely candidate among the re- 
sults of several instantaneous attacks is one of them. Indeed when the leakage 
traces are composed of several points, a side channel attack against the targeted 
sensitive variable must be performed for each point ( a.k.a time index) in the 
traces. Then, the adversary must apply a strategy to select the most likely can- 
didate among the different instantaneous attacks results. A classical method is 
to select the one with the highest score ( e.g . the highest correlation coefficient 
in a Correlation Power Analysis - CPA- [8]). Nevertheless we argue that this 
strategy can be ineffective for some attack categories, including the case of the 
Linear Regression Analysis (LRA) [TT1I30II31] . For the latter one, we propose a new 
strategy to select the most likely candidate and we demonstrate its effectiveness 
in practice. 

Another interesting issue when dealing with a large number of high dimen- 
sional traces is the reduction of the computational complexity. Here again, some 
works have investigated the use of parallel computing to decrease the data pro- 
cessing time [mm but their goal was not to diminish the algorithmic complexity 
of the attacks. This work studies the LRfsQ and the Template Attacks (TA) with 

1 In this paper we only consider the unprofiled version of LRA m 
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this goal in mind. A common structure in their algorithmic description is ex- 
hibited and then used to propose a new general modus operandi which enables 
to significantly reduce the computation time when the number of traces is non- 
negligible. The strategy can also be applied to other attack (e.g. the CPA). 

Finally, to make sure that our analysis is consistent with the reality, we 
completed our investigations by several experiments performed on three micro- 
controllers based on different CMOS technologies (350, 130 and 90 nanometers 
process). We report here on these experiments results. We moreover use them to 
confirm and complete the interesting behaviours observed in [25]: (1) the leak- 
age seems to diverge from the classical Hamming weight model as the CMOS 
technology tends to the nanometer scale, which makes LRA a promising tool for 
side channel evaluations of nano-scale devices and (2) TA is effective in practice, 
even when the templates are built on one copy of the device and the attack is 
done on another copy. 

The paper is organized as follows. In Section [2] we introduce the theoretical 
background for our study and we present the outlines of our proposal. Then, two 
sections are dedicated to the application of our ideas to the LRA and TA attacks 
respectively. 

2 SCA: Practical Issues 

In this section, we introduce some basics and we get into the specifics of the 
problematic focussed in this paper. 


2.1 Notations 

Throughout this paper, random variables are denoted by large letters. A real- 
ization of a random variable, said X, is denoted by the corresponding lower- 
case letter, said x. A sample of several observations of X is denoted by (a;*),. 
It will sometimes be viewed as a vector defined over the definition set of X. 
The notation ( Xi)i X denotes the instantiation of the set of observations 
(xi)i from X. The mean of X is denoted E [X], its standard deviation by cr[X] 
and its variance by var[X]. The latter equals E [(X — E [X]) 2 ] . The covari- 
ance of two random variables X and Y is denoted by cov(X, Y) and satisfies 
cov(X, Y) = E [(X — E [X])(Y — E [Y])]. When we will need to specify the vari- 
able on which statistics are computed, we will write the variable in subscript 
(e.g. E X [Y] instead of E[Y]). 

The notation it will be used to denote column vectors and ~)t [it] will denote 
its u th coordinate. Calligraphic letters will be used to denote a matrix. The 
elements of a matrix M. will be denoted by Af[i][j]. Classical additions and 
multiplications (over real values, vectors or matrices) are denoted by + and x 
respectively. Scalar- vector operations are denoted by • and / (all the coordinates 

2 This work is completed in the extended version of this paper with a similar study 
on CPA. 
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of the vector are multiplied, respectively divided, by the scalar). When applied 
to vectors or matrices, the symbols - 2 and yf denote the operation consisting in 
computing the square (resp. the square root) of all the vector/matrix coordinates. 
Eventually, a function from to F™ will be called a (n, m)-function. 

2.2 General Attacks Framework 

In this paper, the attacks framework is described by considering that the adver- 
sary targets the manipulation of a single sensitive variable Z, but the study and 
results directly extend to contexts where several variables are targeted in par- 
allel. The variable Z is supposed to functionally depend on a public variable X 
and a secret sub-part k such that Z = F(X. k) where F is a (n + n. m)-function 
(which implies X, k G F£ and Z G F™). The bit-lengths n and m depend on the 
cryptographic algorithm and the device architectural. 

The attacks are described under the assumption that the adversary owns N 
side channel traces q. ..., n-i- each of them containing information about 
Z. Namely, the i th leakage trace ~l % it corresponds to the processing of 

a public value Xi X and contains information on the value Z{ Z such 
that Zi = F(xi,k). The dimension of the traces [i.e. the number of different 
instantaneous leakage points) is denoted by d. By definition, we have d = dim L . 

When little information is known about the implementation and the device 
(which is usually the case in practice), the exact manipulation time of Z{ cannot 
be precisely determined a priori. Also, precision in the observation often comes 
at the cost of a high sampling rattJl. As a consequence, the dimension of the 
traces is usually high (from several thousand of points up to millions) and the 
attack must be repeated on all of their coordinates independently (as e.g. in 
LRA) or must consider huge traces chunks globally (as e.g. for TA). Although 
bearing differences, most of side-channel attacks (including LRA and TA) follow a 
common process flow. Starting from this generic description, this paper studies, 
in Sections |3] and H] respectively, the effectiveness and efficiency of the LRA and 
TA attacks. The core ideas of those analyses are presented in the two next sub- 
sections. 


2.3 Effectiveness Discussions 

A part of our study is dedicated to the distinguisher value definition and, more 
precisely its relevance when considering side channel traces with a large number 
of points. This study was motivated by the observation that the classical LRA dis- 
tinguisher value for one leakage time is not comparable as such to that computed 
for another leakage time. Figure [l(a)| illustrates this claim for an LRA targeting 
the device B described in Section [T~5l when directly applying the protocol given 
in mm, the correct key candidate does not maximize the distinguisher value 

3 An example of function F is the function that applies a so-called sbox transformation 
to the bitwise addition between k and X. 

4 Especially in the case of Electro-Magnetic side channel measurements 
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(a) without normlalization (b) with normlalization 

Fig. 1. Instantaneous LRA scores computed over 10000 traces (scores for the correct 
key in black) 


globally but only in a local area, which makes the attack unsuccessful unless this 
area is known by the adversary (which is not assumed here). This observation 
led us to study the handling of distinguishing values in SC A attacks. We for in- 
stance show that by normalizing the LRA distinguishing values, the correct key 
candidate becomes clearly distinguishable even when considering the full vector 
of instantaneous attack results (as depicted on Figure [1(b)] ). 

More generally, our study relies on a well studied problem which is the com- 
parison of the results of two different instantaneous attacks |11U20U331I341I36II38| . 
For the LRA, it will lead to a modification of the candidate selection rule. 

2.4 Algorithmic Complexity Improvements Proposals 

The other important issue an evaluator faces when performing SCA, is the com- 
putational complexity of the attack when the number of traces N and their 
dimension grows to millions. Indeed, the execution time of naive attack imple- 
mentations can easily reach several days of processing and this is not compatible 
with standard evaluation processed. 

We show in Sections [3] and 0] that the two considered attacks may be re- 
written in a partitioning fashion that can be exploited to significantly decrease 
the algorithmic complexity. Roughly speaking, the basic idea is to lower the 
impact of the heavy computations so that its complexity does no longer depend 
on the traces number N but on the dimension n of the targeted data. To that 
purpose, we propose to modify the attack first step so that it processes separately 
the traces with respect to their input value Xi. As a result, the algorithmic 
complexity of the attacks is divided by making the algorithmic improvement 
interesting when N 2" (which is often the case in practice). 

5 In Common Criteria evaluations applied on hardware security devices, all penetra- 
tion tests (including invasive and non-invasive attacks) have usually to be performed 

in 3 months, leaving only few weeks for the whole side channel evaluation. 
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Almost every SCA may be rewritten as a combination of tests on statistics 
estimated on leakage partitions. Some of them ( e.g . the DP A m or the multi-bit 
DPA [23]) were actually originally written as such, whereas the other ones were 
developed in a partitioning way after their introduction (see e.g. [H] for the 
CPA, [TU] for the LRA and [33] for the MIA). To the best of our knowledge, this 
property has however never been exploited to improve the attacks efficiency. 


2.5 Experimental Setup 

For each studied SCA, practical experiments were performed on three Micro- 
Controller Units (MCUs for short) with different CMOS technologies (350, 130 
and 90 nanometers processes). The observed processing was that of an AES128 
encryption handling one byte at a time. Each attack was performed against 4 
sbox outputs of the first round. Furthermore, to measure the variability of our 
experiments, we used three different copies for each MCU (called copy 1, 2 and 
3 in the sequel). This choice enabled us to perform the TA profiling step on one 
copy and to use the results to attack other ones. Also, it gives more credit to our 
experimental results as the templates consistency was checked on three different 
versions of the same MCU. 

The side channel observations were obtained by measuring the electromagnetic 
(EM) radiations emitted by the device. To this aim, several sensors were used, 
all made of several coils of copper (the diameters of the coils were respectively of 
1mm, 500pm and 250pm for the 350, 130 and 90nm MCUs), and were plugged 
into a low-noise amplifier. To sample measurements, a digital oscilloscope was 
used with a sampling rate of 1G samples per second for the 350nm MCU and 
10G samples per second for the others, whereas the MCUs were running at few 
dozen of MHz. 

We insist on the fact that the temporal acquisition window was set to record 
the first round of the AES only. This synchronization has been done thanks to 
simple electromagnetic analysis [2B]. As the MCU clocks were not stable, we had 
to resynchronize the measurements. This process is out of the scope of this work, 
but we emphasize that it is always needed in a practical context and it impacts 
the measurements noise. 

We sum-up the specificities of the three experimental campaigns hereafter: 

- Device A (3 copies): 90nm CMOS technology with MCU based on a 8-bit 
8051 architecture. EM traces composed of 12800 points each after resynchro- 
nization. Highest Signal to Noise Ratio (SNR) over the full traces equals to 
0.09. 

- Device B (3 copies): 130nm CMOS technology with MCU based on a 8- bit 
8051 architecture. EM traces composed of 16800 points each after resynchro- 
nization. Highest SNR equals to 0.6. 

- Device C (3 copies): 350nm CMOS technology with MCU based on a 8- bit 
AVR architecture. EM traces composed of 51600 points each after resyn- 
chronization. Highest SNR equals to 0.3. 
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3 Practical Evaluation of Linear Regression Attacks 

Linear regression attacks (a.k.a. stochastic attacks) have been introduced by 
Schindler et al. in 2005 [5Tj . Initially, they were presented with a profiling step 
and were viewed as an alternative to the template attacks [IB] . In [TT], the 
authors have shown how to express the linear regression attacks such that the 
profiling stage is no longer required. They also argued that the LRA can be 
applied in the same context as the CPA, but with weaker assumption on the device 
behavior. Subsequently, these results of Doget et al. have been extended in [TD] to 
apply against masked implementations. In parallel, linear regression attacks have 
been used to analyse/model the deterministic part of the information leakage for 
complex circuits [H]|T5]. As a matter of fact, all those analyses assume that the 
side-channel traces are composed of a single leakage point: the issue raised in 
Section 2.3 is thus put aside. Moreover, the question of the efficient processing of 
the attack, when applied against high dimensional leakage traces, is not tackled. 
The rest of this section aims at dealing with two issues. 

3.1 Attack Description 

In LRA, the adversary chooses a so-called basis of function^] (m p )i^ p ^ s with the 
only condition that mi is a constant function (usually mi = 1). Then, for each 
Xi and each sub-key hypothesis k. the prediction Zi = F(xi,k) is calculated. 
The basis functions m p are then applied to the %i independently, leading to the 
construction of a (N x s)-matrix M k = (m p (F(xi, k))i. p . The comparison of 
this matrix with the set of d-dimensional leakages (7,)^ v L is done by 

processing a linear regression of each coordinate of 7* in the basis formed by the 
row elements of M. k - Namely, a real- valued (s x d)-matrix B- k with column vectors 
$ 1 , • • • ,~$d is estimated in order to minimize the error when approximating ~tj 
by (mi(F(a;», k)), ■ ■ ■ ,m s (F(xi, k ))) x B k . The matrix B g is defined such that: 

B~ k = (MjxM^xMj xC , (1) 

n 

where C denotes the ( N x d)-matrix with the ~tj as row vectors. In the following, 
the u th column vector of C (composed of the u th coordinate of all the ~tf) is 
denoted by ~£[u\. Moreover, the ( s X IV)-matrix (M.J x A l k ) 1 x AdJ, which 
does not depend on the leakage values, is denoted by V k . 

To quantify the estimation error, the goodness of fit model is used and the 
correlation coefficient of determination 1Z 2 is computed for each u. The latter 
is defined by F? = 1 — SSR/SST, where SSR and SST respectively denote 
the residual sum of squares (deduced from Bjf) and the total sum of square^ 
(deduced from C). We give in Algorithm Q] the pseudo-code corresponding to a 
classical LRA attack processing. 

6 The basis choice and its impact are not a trivial matter, see [TO] for a detailed study. 

7 For their exact definitions, see their construction in Alg. [T] 
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Algorithm 1: LRA - Linear Regression Analysis 


Input : a set of d-dimensional leakages (t %) i<N and the corn 
(*iW, a set of model functions (m P T p<a 
Output: A candidate sub-key k 

/* Processing of the leakage Total Sum of Squares CSS 7 ?) 

L for i = 0 to N - 1 do 

+ "?? 


izz: 

i SS? = trg - 1/N-I& 


T for i = 0 to N - 

|_ L A4 S HM<- 

V- k = (AfT x 


) for fc = 0 to 2 n - 1 do 

/* Test hyp. k for all leakage coordinate; 
for u = 0 to d - 1 do 


/* impute an estimator 3 of t[u] = (t 0 [u],-- - ,~t S-: 
^ = ARx| 

SSR = 0 

for i = 0 to N - 1 do 
|_ SSR = SSR + (^[«] -?<[«]) 2 

■R[k][u] = 1 - SSR/SS?[u] 


3.2 On the LRA Effectiveness 

Let us focus on the best candidate selection step in a classical LRA. Each sub-key 
hypothesis k is first associated with a score which is the greatest instantaneous 
coefficient of determination when testing it for all temporal coordinates u. It 
is denoted by max„7?.[fc][u] in Alg. Q] The second phase of the selection con- 
sists in the processing of the maximum argmaxj. (max u 1Z[k\ [it]). The purpose of 
the latter step is to identify the candidate that maximises the greatest instan- 
taneous coefficient. Implicitly, such a classical approach by total maximisation 
of the distinguisher value assumes that the most likely candidate corresponds to 
the greatest value taken by the distinguisher not only over all sub-key hypotheses 
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but also over all the leakage times. This assumption relies on another one, often 
done in the embedded security community, which states that the value of a 
distinguisher computed between wrong hypotheses ( i.e . computed for a wrong 
sub-key value or a wrong time) and the leakage values tends toward its minimum 
value (often 0) when the sample size N increases (see e.g. [ID])- However, as 
already noticed in several papers {e.g. by Messerges in [53], Brier et al. in [5] or 
by Whitnal et al. in @0]), both assumptions are often not verified in practice, 
where the adversary must for instance deal with the ghost peaks phenomenon. 
The situation is even worst for the LRA attacks since the vector of coefficients 
(and thus the set of predictions) depends not only on k but also on the attack 
time u. The strength of the LRA, namely its ability to adapt to the instantaneous 
leakage, is also its weakness as it makes it difficult to compare the different 
instantaneous attacks results. 

To illustrate the issue raised in the previous paragraph, we experimented a 
LRA against an AES sbox processing running on Device B (see Section 1531) . The 
full leakage traces were composed of 16800 points. We performed the attack 
on the full trace length and, for each time coordinate, we recorded the scores 
of all the 256 key-candidates after N = 1000 observations. For clarity reasons, 
we present in Figure [5] the results only for a temporal window of size 250 points 
where the targeted variable was known to occur. In the top of the figure, the rank 
of the correct key is plotted and it can yet be observed that it is 0 for few times. 
In the second trace of Figured the instantaneous maximum scores comprised in 
[0.9982,0.999] are plottec0: it may be checked that the maximum among those 
scores corresponds to a time {t = 238) when the correct key is not ranked first. 
This explains why the total maximisation approach fails in returning the correct 
key candidate in this case. 

To build a better rule than the total maximisation test, we respectively plotted 
in the third and fourth traces of Figure [5] the mean (plain green trace) and the 
variance (plain red trace) of the instantaneous scores {i.e. the values p{u) = 
2 _8 X)fc'^-WM and o{u) = 2 _8 X)&(7£[fe][ti] — At( u )) 2 with u denoting the time 
coordinate in abscissa). For each time, we also plotted in black dashed line, the 
maximum score max(u) = maxj. (72. [/,::] [u]). ft may be observed that the correct 
key is ranked first at the time u when the distance max(tt) — p(u) is large 
and o{u) is small. The third (red) trace and the fourth (gray) trace aim at 
supporting this claim. Eventually, they suggest us the following pre-processing 
before comparing the instantaneous attack results: for each leakage coordinate, 
center the maximum of the coefficients of determination and divide it by their 
standard deviation. The resulting scoring is plotted in the fifth (magenta) trace, 
where it can been checked that the maximum is indeed achieved for the correct 
key. 


For visibility purpose, we chose to not plot the 


lower than 0.9982. 
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Fig. 2. LEA on Device B (over 1000 traces): Scores Statistics 

As a conclusion, and in the light of our analysis, we propose to replace the 
candidate selection step of the LRA by the following oneo 


18 for u = 0 to d - 1 do 

19 attackRes[«] = {argmax A (K[fc][«]), 

20 candidate = arglmax 2 (attackRes) 


max, (w[t][ul)-E, [r^IM] 

— } 


In Section [Ol our scores pre-processing technique is applied to attack samples 
of Device A and Device C in order to test whether our observations, about (1) 
the ineffectiveness of the classical LRA and (2) the soundness of the new pre- 
processing, stay valid for other devices than Device B. 


3.3 On the LRA Efficiency 

The construction of the prediction matrices in Alg. Q] implies, for each k, the 
processing of 3 products of matrices with one dimension equal to s (number 
of basis functions) and the second dimension equal to N (number of leakage 
traces). The processing of the instantaneous attacks also requires two such matrix 
products for each pair (k, u) with u < d. This makes the application of a linear 
regression attack as depicted in Alg. Q] difficult to perform (and even impossible) 
when the number N of leakage traces and/or the number d of attack times are 

9 Where arglmax2 is a function returning the first coordinate of the maximum of an 
array of 2-dimensional elements, the maximisation being computed with respect to 
the second coordinate of the array elements. 
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large. Fortunately this complexity can be significantly reduced. It can indeed 
be easily shown (see [TO]) that the processing of the vectors is unchanged if 
performed for the set of averaged leakages J2i, Xi = x ~^*)xe instead of 

Actually, this amounts to change the definition of the matrices My. and C 
in ([T]) such that My. = (m p (F(x, k)) X £W", P ^s and £ is a (2" X d)-matrix whose a; th 
row vector [a;] equals Yli, Xi =x ^ Tins improvement essentially lets 

the first 9 steps of Alg. [T] unchanged except the loop 7-8 which is now computed 
over x £ instead of over i e [0; N — 1], Then, before Step 10, the following 
processing is done to compute the elements of the matrix C: 


for 

L 


i = 0 to N - 1 do 




for x = 0 to 2 n - 1 do 

L ^[x]=^[x]/count[x] 


Eventually, Steps 10-17 are replaced by the following ones where we recall 
that ~i[u] denotes the u th column vector of L. 



The efficiency improvements proposed here for the LRA attack allows for a 
significant time/memory gain. First, it replaces the (N X s)-matrix products at 
Step 13 by (2” x s)-matrix products. More globally, the complexity is reduced 
from 0(s x d x N) to 0{s x d x 2”). If the values are not needed (i.e. the 
weights of the linear regression is of no interest to the attacker), the matrix 
products My.xVy. can also be pre-processed. This enables to save one matrices 
product per loop iteration (over k and u). 
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3.4 Experiments 

We experimented the classical and improved LRA against against three copies of De- 
vices A, B and C (see Section l231i . The attacks target four bytes of the AES state af- 
ter the first SubBytes operation and they are applied on the full side channel traces. 
Each attack has been performed 10 times against each of the three copies. The av- 
erage rank over the four correct sub-keys is plotted in Figure|3]for each device. We 
recall that the rank of a sub-key k is here defined as the position of max u 7 Z[k\ [u] in 
the vector (max B 1Z[k] [u])^. after sorting (see Section EO for a discussion about this 
choice). The experiments reported in Figure EHa)-(c) are done with a linear basis 
(i.e. the functions m* were chosen such that mo is constant equal to 1 and mj, with 
i < 8, returns the 'i th bit of its inputs). It may be observed that the classical attack 
always failed whereas the improved one succeeded with less than 2500 observations 
(and even less than 800 for Device B) . 



(a) LRA on Device A (b) LRA on Device B (c) LRA on Device C 

Fig. 3. LRA campaign - Rank evolution versus number of observations 


4 Practical Evaluation of Template Attacks 

Template attacks have been introduced in 2002 by Chari et al. [§]. Subse- 
quent works have then been published which either show how to apply them 
against particular implementations ( e.g . AES, RSA or ECDSA) or propose ef- 
ficiency/effectiveness improvements [I]|31I271|!2H] • In [27], the authors reduce the 
complexity of template attacks by first applying a pre-processing on the mea- 
surements (to go from time domain to frequency domain) and then by applying 
dimension reduction techniques (e.g. PC A). The latter idea is also followed in [1] 
and [3]. In all those papers, the improvement of the template attacks efficiency 
is not studied at the algorithmic level. Moreover, the reported template attack 
experiments involve the same device for the profiling and matching phases of the 
attacks, which strongly reduces the practical significance of the argumentations. 
Indeed, as the profiling phase requires a full access to the device (and in partic- 
ular the ability to chose the secret parameter), the latter experiments do not fit 
with the large majority of real attack/evaluation contexts where the adversary 
has no (or very few) control on the target device. In a more realistic attacker 
model the profiling phase is conducted on a different device. For such a model, 
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we have the following well known quest ioif^l about the efficiency of template 
attack: how sound/relevant is a profiling done on a device A when attacking 
another device B ? The first work, and to the best of our knowledge, the single 
one reporting on template attacks in such context is due to Renauld et al. [21] • 
On the latter article, the two devices used for the experiments are test chips 
implementing an AES s-box and made in 65-nanometer CMOS technology. 

The results presented in the rest of this section improve the state-of-the-art 
recalled previously on two points. First, the efficiency improvement is done at 
the algorithmic level. It can hence be combined with the previous improvements 
which essentially correspond to measurements traces pre-processing. Secondly, 
the reported experiments concern a full AES implementation running on 3 differ- 
ent samples of 3 different technologies. This allowed us to complete the analyses 
done in [29] and to draw, for the first time, conclusions about the template attack 
efficiency for realisitic scenarios. 


4.1 Attack Description 

A template attack (TA for short) assumes that a preliminary profiling step has 
been performed on an open copy of the targeted device. During this phase, the 
adversary has measured N' leakage traces ~ti for which he knows exactly the 
values taken by the corresponding sensitive value Z (which also implies that 
he knows the corresponding sub-key k). Those leakages have then been used to 
compute estimations f z (-) of the probability density function of (L | Z = z) for 
all possible z (which imposes N 1 2 m ). The pdf estimations f z (-) will play in 
a template attack, a similar role as the model functions in a CPA or LRA. 

Once the adversary has the set of pdf estimations ( fz{-))zew ™ in hand, a 
TA against the set of traces (~t i)i<;N (for which the secrets are unknown) fol- 
lows essentially the same outlines as the LRA: the hypothesis k is tested by first 
computing the predictions = F(xi, k) and then by calculating the product 
rii<jv Usually, the pdf of the variables (T^ | Z = z) is estimated by a 

multivariate normal law, which implies that f z can be developed s.t.: 

«u> = wW xp( ■ - %)T ^ J 1(t - - *■» ’ (2) 

where E z denotes the (d x d)-matrix of covariances of L | Z = z and where the 
(fl)-dimensional vector z denotes its mean. 

To minimize approximation errors induced by the processing of the product 
of exponential values, one usually prefers, in practice, a log-maximum likelihood 
processing to the classical maximum likelihood**!. Together with fl2|, this leads 
to the following computation to test the hypothesis k: 


10 This question is sometimes also related to the statistical problem of pdf estimations 
robustness |251 . 

11 The two processes discriminate equivalently. 
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MC[k] = -J2 log((27r) d+1 det(^)) . (3) 

i<JV i^N 


We give in Alg. [5] the pseudo-code corresponding to the TA attack discussed 
previously. 

Algorithm 2: TA - Template Attacks 

Input : a set of d-dimensional leakages (t and the corresponding plaintexts 
V, a set of pdf estimations (T«,S»)«ei» 

Output: A candidate sub- key k 

E- 1 */ 

1 for 2 = 0 to 2 m — 1 do 

2 I logDet z = log( 27 r' i + 1 i: i ) 
s |_ invC 0 v z = Z- 1 

/* Instantaneous TA attacks Processing */ 

4 for k = 0 to 2 n - 1 do 

/* Test hyp. k */ 

5 MC[k] = 0 

6 for i = 0 to N - 1 do 

7 I Z = F{ Xi ,k) 

8 |_ MC[%] = M£[k] - (ii - X invCovg X (t » - -jt s ) - logDetj 


/* Most likely ce 





4.2 On the TA Effectiveness 

The idea developed in previous sections to improve the selection of the best can- 
didate among the results of several instantaneous attacks is not relevant here. 
Indeed, for both the profiling and attack phases, a template attack exploits, 
by nature, all the leakage coordinates of the i simultaneously. There is con- 
sequently no need to compare the results of several (different) instantaneous 
attacks. 

4.3 On the TA Efficiency 

Applying the same idea as for the LRA, we propose hereafter an alternative 
writing of M.C[k] that leads to a much faster attack processing. For such a 
purpose, we focus on the term (~ti — 7 t i t ) T i — ~ft^) i n (Ell- 

After denoting by Li each ( dx d)-matrix [t i[u\t i[u'] ) u ,u', we get the follow- 
ing rewriting of the latter term: 


53 (AMKl x - tl x (I^ 1 + S^ T ) x~ti + 1 tT X J7T 1 X 74 
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After recalling that equals F(xi,k ) and after denoting F(x,k) by z and 
= x} by N x , we deduce that the sum JA (jl i — z t ) r FT* i — 
may be rewritten: 


E E( E dwwxvww 

xeF" \u,u' i,Xi=x 

-7 1] x (E^ * 1 + Er lT ) X ( Y ~^i) +N x x-/tJ x E” 1 x tz j • 

As a consequence, if the 2” possible stuns JA x =x A an< i x =x i ^ ave 
been precomputed, then the complexity of evaluating (|3]) for each k goes from 
0(Nd 2 3 4 5 * * * * * * * 13 14 ) to 0(2" d 2 ). Algorithm [3] describes the improved TA attack. 


Algorithm 3: TA - Template Attacks (Improved Version) 

Input : a set of N leakages (V*)i and the corresponding plaintexts (x*)o a set oi 
estimations (~jt S z ) z ^m 
Output: A candidate subkey k 
/* Pre-Processing of the predictions data 

1 for z = 0 to 2 m — 1 do 

2 I logDet^, = log(27r d+1 Z , z ); invCov* = J7" 1 ; meanCov* = 

|_ ~jtj X invCov z X sumMeanCov 2 = ~jt J X (X7J 1 + 17“ lT ) 

/* Pre-Processing of the leakage dat £ x = Yi x . =x ~^ i*£x = Y% x-=x^' i 2111(1 
N[x] = = x}. 

3 for i = 0 to N - 1 do 

4 |_ a = * t{ JV[e] = jV[«] + 1; C„ =4s = + 

/* Instantaneous TA attacks Processing 

5 for k = 0 to 2 n — 1 do 

/* Test hyp. k 

MC[k] = 0 

for x = 0 to 2 n - 1 do 

z = Fi{xM 
for u = 0 to d - 1 do 
I for u' = 0 to d - 1 do _ 

|_ M£[fe]=Af£[fe]-£ a ,H[u']xinvGoviH[u'] 

A4£[k] = M.C[k\ + sumMeanCov 2 x £ x — N[x] X (meanCov* + logDet s ) 


/* Most likely candidate selections */ 

13 candidate = argmax^(max J\AC[k]) 

14 return candidate 
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4.4 Experiments 

To study the effectiveness of TA attacks in practice (and to confirm the observa- 
tions reported in [55] 1 we experimented them against the families of devices A, 
B and C for three different scenarios. In the first scenario (referred to as ’’copy 
1 — >- copy 1”), the profiling and the attacks are performed on the same device 
copy. In the second and third scenarios (respectively referred to as ’’copy 1 — > 
copy 2” and ’’copy 1 -A copy 3”), the profiling made for copy 1 is used to attack 
the second and third copies. For each of these 9 attacks frameworks, we plot 
in Figure 0 the average rank of the correct sub-key (in color) with respect to 
both the number of traces used for the profiling (in ordinate) and the number of 
traces used for the attack (in abscissa). The rank averaging has been done over 
10 attacks. 

In the first scenario, a profiling done on 15000 (resp. 47000) traces on Device 
B (resp. Device C) allows for a very efficient attack phase (the correct sub-key 
ranked first with less than ten traces). Moreover, it may be observed that a 
profiling on 8000 traces for Devices B and C is sufficient to have a successful 
attack in less than 23 (resp. 90) traces for device B (resp. C). For Device A, the 
TA attack in Scenario 1 is one order of magnitude less efficient (roughly speaking 
the values are multiplied per ten w.r.t. the traces for devices B and C). 

Attacks on Devices B (resp. C) perform quite similarly in Scenarios 2 and 3. 
For Device A, a profiling performed on copy 1 for 18000 traces is sufficient to 
successfully attack copies 2 and 3 with less than 10 traces. Moreover, a profiling 
on 8000 traces enables successful attacks for less than 30 traces. For Device C, it 
may be observed that, even for a profiling performed on 50000 traces, the attacks 
on copies 2 and 3 require at least 80 traces to succeed. However, a profiling on 
9000 traces is sufficient to have the TA succeeding in less than 130 traces. 

As expected, we may observe a significant variability for the attack results in 
Scenarios 2 and 3 for Device A: templates done on copy 1 are almost as efficient 
to attack copy 2 than they were to attack copy 1 itself. They are however much 
less informative on the copy 3 behaviour since the profiling on copy 1 must be 
performed on at least 130000 traces to see the attack working on copy 3 with 
less than 700 traces. This observation is in-line with those done in [55] about the 
high variability of nano-scale technologies (we recall that Device A is made in a 
90nm CMOS technology). 

In the full version of this paper, we report on similar experiments results 
when only the leakage means (and not the covariance matrices) are involved 
in the templates. This approach indeed seems to be a natural alternative to 
the attacks described here since the traces contain instantaneous leakages. Our 
results actually confirm this feeling and it can even be noticed that it leads to 
improve the TA efficiency for Scenario 3 on Device Ap^l. Another general remark 
on these simplified templates is that they perform much better than the full ones 
when the number of traces used for the profiling is small (around 4000). 


13 This could be explained by the fact that the technology variability has more impact 
on the electromagnetic leakage variances than it has on the means. 
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(a) Dev. A: copy 1— J-copy 1 (b) Dev. A: copy 1— >copy 2 (c) Dev. A: copy 1— >copy 3 



(d) Dev. B: copy 1— J-copy 1 (e) Dev. B: copy 1— >copy 2 (f) Dev. B: copy 1— >copy 3 



(g) Dev. C: copy 1— >copy 1 (h) Dev. C: copy 1— >copy 2 (i) Dev. C: copy 1— >copy 3 

Fig. 4. TA campaigns - Rank evolution vs. nb. of traces for the attack phase (x-axis) 
and the profiling (y-axis) 


5 Conclusion 

In this paper, we have studied the effectiveness and efficiency of the LRA and the 
TA attacks when performed in a context where the exact time of the sensitive 
computations is not known. In this situation, and even after the application of 
pattern matching or resynchronization techniques, the exploited leakage traces 
may be composed of several thousands of points and the same attack must 
be processed for each of those points. We noticed that the study of the side 
channel attacks effectiveness and efficiency in this multivariate context is an over- 
estimated problem. Most of the time, it is indeed assumed that the adversary 
succeeded in significantly reducing the traces size (e.g. by priorly processing a 
SNR, or a test attack, or even a dimension reduction). However, as argued in 
this paper, those techniques are either unrealistic or may lead to a significant 
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loss of useful information (a dimension reduction technique like the PCA may 
be sound for one attack - e.g. the CPA - and not for another one - e.g. the MIA or 
the LRA -). As a consequence, there was no work discussing about the rule to 
apply in order to select a candidate among all of those returned by a same attack 
performed against several time coordinates. To the best of our knowledge, the 
de facto rule was hence to simply choose the key candidate maximising all the 
attacks scores. In this paper, we have shown that this rule does not work for a 
LRA attack and we have conducted a statistical analysis to deduce a new selection 
rule that renders it effective in practice, even when the traces are composed of 
huge number of points. In this paper, we have also tackled out the efficiency 
problem for the multivariate LRA and TA attacks. For each of them, we have 
followed a similar approach which led us to significantly reduce their complexity 
when the number of traces and their dimension are high. It may be noticed that 
the approach could also be applied (almost straightforwardly) to improve the 
efficiency of the correlation power attack and of the mutual information attack 
(with histogram pdf estimation). Eventually, all our results and analyses have 
been illustrated by several attack experiments on three different copies of three 
different technologies. In particular, the latter experiments have enabled us to 
confirm the practicability of template attacks when the profiling phase and the 
attack are performed on different copies of the same device. 
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Abstract. Side-Channel Analysis (SCA) is commonly used to recover 
secret keys involved in the implementation of publicly known crypto- 
graphic algorithms. On the other hand, Side-Channel Analysis for 
Reverse Engineering (SCARE) considers an adversary who aims at re- 
covering the secret design of some cryptographic algorithm from its 
implementation. Most of previously published SCARE attacks enable 
the recovery of some secret parts of a cipher design -e.g. the substitu- 
tion box(es)- assuming that the rest of the cipher is known. Moreover, 
these attacks are often based on idealized leakage assumption where 
the adversary recovers noise-free side-channel information. In this pa- 
per, we address these limitations and describe a generic SCARE attack 
that can recover the full secret design of any iterated block cipher with 
common structure. Specifically we consider the family of Substitution- 
Permutation Networks with either a classical structure (as the AES) or 
with a Feistel structure. Based on a simple and usual assumption on 
the side-channel leakage we show how to recover all parts of the design 
of such ciphers. We then relax our assumption and describe a practical 
SCARE attack that deals with noisy side-channel leakages. 


1 Introduction 

Side-Channel Analysis for Reverse Engineering (SCARE) refers to a set of tech- 
niques that exploit side-channel information to recover secret algorithms and/or 
software/hardware designs. One of the main application of SCARE is the recov- 
ery of symmetric ciphering algorithms of private design, as often used in Pay-TV 
and GSM authentication protocols. The first SCARE attack in this context was 
introduced by Novak [25] , who showed how to recover one out of two s-boxes 
from a secret instance of A3/A8 algorithm (used in GSM protocol). This work 
was subsequently improved by Clavier [ID] who described how to recover both 
s-boxes altogether with the secret key used by the cipher. In parallel to these 
results, Daudigny et al. D3] showed that simple secret modifications of the DES 
cipher could also be recovered from side-channel observations. In a more recent 
work, Real et al. m took a closer look at Feistel schemes in a more general 
sense. They showed how an adversary that gets the Hamming weight of some 
intermediate result can interpolate the round function of the cipher. Eventually, 
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a SCARE attack on stream ciphers was proposed by Guilley et al. jT5] . They 
showed how to retrieve the overall design when either the linear or the nonlinear 
part of the cipher is known. 

Our Contribution. In this paper, we introduce a SCARE attack that recovers 
the full secret design of an iterated Substitution-Permutation Network (SPN for 
short), namely an iterated cipher composed of substitution boxes (or s-boxes), 
linear layers and key additions. As in j25ITO] . our attack is based on the simple 
assumption that the side-channel leakage enables the detection of colliding s- 
box computations. Specifically, the attacker is able to select strips of side-channel 
traces where the s-box computations are located and decide on collisions between 
the processed values from the observation of these traces. This assumption has 
been the basis of various previously published side-channel key-recovery attacks 
(see for instance mmmMsimm)- We first show how a perfect detection 
of colliding s-box computations enables an efficient recovery of a secret cipher 
with classical SPN structure as the one of the AES M- Roughly speaking, the 
collision detection mechanism allows us to build simple linear equation systems 
involving the different unknowns of the cipher algorithm (i.e. the s-box values, 
the linear layer coefficients, the secret round key coordinates). In the full version 
of the paper [29] , we further extend our basic attack to relax as much as possible 
the constraints on the design, allowing several different s-boxes, binary linear 
layers, and Feistel structures, so that we cover a wide spectrum of usual block 
cipher designs. In the second part of this paper, we address the practical aspects 
of our attack and relax the perfect detection assumption. We describe a practical 
SCARE attack working in the presence of noise in the side-channel leakage and 
we provide experimental results showing its practicability. 

Related Work. In a recent independent work m , Clavier et al. present a 
SCARE attack against AES-like block ciphers. The authors consider a chosen- 
plaintext and known-ciphertext scenario with perfect detection of colliding s- 
boxes. Under these assumptions, they show how to efficiently recover the secret 
parameters of a modified AES. They further address the case of protected impie- 
mentations with common software countermeasures against side-channei attacks. 
In comparison, our attack targets a wider class of SPN ciphers, including mod- 
ified AES ciphers as a particular case. Moreover, we extend our attack to deal 
with noisy leakages, hence relaxing the perfect detection assumption. However, 
we do not deal with the case of protected implementations (though we give a 
few insights about it in Section 0. 

Paper Organization. In the first section we describe the design of target SPN 
block ciphers. Then we present our generic SCARE attack in Section [3] The 
practical SCARE attack dealing with noisy leakages is described in Section^ and 
experimental results are presented in Section^ Finally, we give some discussions 
and perspectives in Section [5] 
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2 Substitution-Permutation Networks 


We consider a block cipher E computing an Abit ciphertext block c from an 
t- bit plaintext block p through the repetition of a key-dependent permutation, 
called round function p. Each round is parameterized by a different round key 
ki derived from the secret key k through a key scheduling process. Let r denote 
the number of rounds, the ciphertext block is then defined as 

c = E k (p) = p kr o Pkr _ t o • • • o Pkl (p) . 

In an SPN block cipher, the round function is composed of linear permutations 
and nonlinear substitutions, and the key is introduced by addition. The addition 
and linearity are considered over the vectorial space F^. Namely round keys 
are introduced by a simple exclusive-or (XOR), and linear permutations are 
homomorphic with respect to the XOR operation. Non-linear substitutions are 
applied on small blocks of bits which are replaced by new blocks looked- up from a 
predefined table usually called s-box (for substitution-box). In what we shall call 
a classical SPN structure, the different s-boxes and linear transformations are 
bijective ( e.g . the Advanced Encryption Standard M)- But when they are not, it 
is common to use a so-called Feistel scheme in order to make the round function, 
and hence the overall cipher, invertible (e.g. the Data Encryption Standard USD- 
In the following, we only focus on the classical SPN structures. Extension of our 
work to Feistel schemes is provided in the full version of the paper. 

In a classical SPN structure, the plaintext is considered as a n-dimensional 
vector of m-bit coordinates: p = (pi,P 2 , ■ ■ ■ ■ Pn)- with l = nm. The round func- 
tion is composed of a key addition layer a ki , a nonlinear layer 7, and a linear 
layer A, that is 

Ph = A o 7 o a ki . 

The key addition layer is a simple XOR-ing of the round key: 

cr k (p) =P®k . 

The nonlinear layer consists of the parallel application of an m x to s-box S: 
l{p) = {S(p 1 ),S(p 2 ),...,S(p n )) , 


And the linear layer is a linear transformation over (F2*») n : 


^ aip ai t 2 ■ • • Qi,n,^ 


^Pl') 

a 2,l a 2,2 ' ' ' 0.2, n 


P2 

Wd On , 2 ■ ■ ■ a n ,n) 


\Pn) 


where the a t j and the pj are considered as elements of F 2 r 


( 1 ) 


Remark 1. The final round sometimes skips the linear layer and an additional 
key addition is often performed after the final nonlinear layer. The attack de- 
scribed in this paper works as well for these variants. 
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3 Basic SCARE of Classical SPN Structures 

3.1 Attacker Model 

We present a generic SCARE attack in a known-plaintext scenario, and we show 
how its complexity can be lowered in a chosen-plaintext scenario. Our attack 
does not require the knowledge of the ciphertext but only exploits the side- 
channel leakage of the cipher execution. Moreover, it is assumed that colliding 
s-box computations can be detected from the side-channel leakage. Specifically, 
we assume that the attacker is able to 

(i) identify the s-box computations in the side-channel leakage trace and extract 
the leakage corresponding to each s-box computation, 

(ii) decide whether two s-box computations yi <— S(xi) and y-i <— S{xf) are 
such that X\ = X 2 or not from their respective leakages. 

Remark 2. This assumption implicitly means that the cipher implementation 
processes the s-box computations in a sequential way and that two s-box com- 
putations of the same input at two different points in the execution produce 
identical side-channel leakages. These constraints are further discussed in Sec- 
tion |5J 

Under the above assumption, the attacker can identify r different groups of n 
s-box computations, and hence recover the number r of rounds, the number n of 
s-boxes per round and hence the s-box size m = l/n. where t is the block size. We 
will therefore assume these parameters to be known in our attack description. 

In what follows, we first show how the above assumption enables the complete 
recovery of a secret cipher with SPN structure as described in Section 0J In 
Section 01 we relax this assumption and extend our attack to deal with noisy 
leakages which can lead to decision errors in the collision detections. 


3.2 Equivalent Representations 

Several equivalent representations are possible for an SPN cipher such as con- 
sidered here. For instance one can change the s-box S for the s-box S' defined 
as 

S'(x) = S( x © 6) 

for some S £ F 2 m, and replace every round key fej = (/q.i , . . . , by 

k[ = (ki t i © S, ki t 2 © S , ... , ki tn © 5) . 

The two representations are clearly equivalent in a functional sense. Moreover, 
the ability of detecting collisions in s-box computations does not make it possible 
to distinguish between two different equivalent representations. 

Another way to obtain equivalent representations is by changing the s-box S 
for the s-box S' defined as 


S'(x) = a ■ S(x) 
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for some a £ F^fy , and by replacing the linear layer A defined in O by the linear 
layer A' obtained from the matrix (a' i j) i . whose coefficients satisfy 

a 'i,j = a i,j/ a 

for every (i,j). 

In our attack, we fix the first round key coordinate k-yp to 0 and we fix the 
coefficient dip to 1, which is equivalent to fixing the variables 6 and a. Note 
that dip may equal 0 (which is revealed by the attack), in which case we try 
fixing di,2, then <21,3. and so on. We describe hereafter the successive stages of 
the attack. 

3.3 Stage 1: Recovering k% 

Since we have fixed kyp = 0, we aim to recover the n — 1 remaining subkeys 
ky >2 , fci.,3- . . . , ky Let X denote the set of indices i for which kyp is known. At 
the beginning of the attack X = {1}. Then for any collision [y* <— S(pi ® kyp)} ~ 
[yj <— S(pj ® kyj)] for some »6l, one deduces 

ki j = Pj ® Pi ® kyp , 

and the index j is added to I. We expect to retrieve all subkeys with less than 
2 m / 2 encryptions. 

3.4 Stage 2: Recovering A, S and k 2 

Once ky has been recovered, one knows the inputs of the s-box in the first round. 
Let us define Xi = S(i) for every i £ {0, 1, . . . , 2 m — 1}, so that recovering the 
s-box means recovering the 2 m unknowns Xo, xy, ... , X'2 m -i ■ The attack consists 
in constructing a set of equations in the afys, the cqq’s and the k 2 /s. Solving 
the obtained system hence amounts to recover A, S and k 2 . 

The first step of this stage consists in collecting the leakages Ip from s-box 
computations p <— S(/3) for every fi £ F 2 ™. We shall denote by B the obtained 
leakage basis {£p \ j3 £ F-2>n }. Such a basis can be constructed since k\ is known 
from the first stage, hence the inputs of the s-box computations in the first 
round are known. This basis is then used to detect collisions between s-box 
computations in the second round and s-box computations p £- S(/3). Let Wj be 
the jth s-box input before key addition in the second round (i.e. Wj is the jth 
to- bit output of the first round), in the encryption of some plaintext p. Then Wi 
satisfies 

Wi = pip % ® a *, 2 © • • • ffi a i>n Xj n , 

where jt = Pt® fci.t is a known index. If the corresponding s-box computation 
Pi <— S(wi ® k‘ 2 .i) collides with some s-box computation p £- S(8) from B, then 
we get the following quadratic equation 

a,ip x jl ffi a it2 x h ffi • • • ffi a it „ x jn ffi k 2>i = /3 . 

Once several such equations have been collected, one can solve the system and 
recover all the unknowns (i.e. the afys, the aqq’s and the k 2 /s). 
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Solving the System. In order to solve the quadratic system obtained from 
all the collected equations, one can use the linearization method. The mono- 
mial a,ij x u is replaced by a new unknown y t for every triplet t = ( i,j , u ) where 
1 < hj < n and 0 < u < 2 m — 1. We get alinear system with 2 m n 2 +n unknowns 
(the y t and the k 2 j), which can be solved based on 2 m n 2 +n independent equa- 
tions. Since every encryption provides n new equations, the required number of 
encryptions is 2 m n + 1. 

However, using linearization is not mandatory and we show hereafter that the 
system can be directly rewritten as a linear system. To do so, we consider the n 
equations obtained for the different s-box computations at the same time. Let 
Pi, p 2 , . . . , P n be the values such that y* <— S(wi®k 2t i) collides with <— S(/3i). 
The obtained system for the n equations can be written in matrix form as 

A ■ x ® fe 2 = P , 

where A = x = (xj^x^, . . . ,Xj n ) T , k 2 = (/c 2 ,i, fc 2 , 2 , • • • , k 2 , n ) T and 

f3 = (J3i, P 2 , . . . , Pn) T ■ Since A is invertible, we have 

x © A _1 • fe 2 = A -1 ■ /3 . 

Let k' 2 = (/4,i> & 2 , 2 i ■ ■ ■ ■ k' 2 , n ) denote the vector resulting from the product A~ x ■ 
k 2 and let ■ denote the coefficients of A -1 . We obtained the n following 
equations: 


x ji © k' 2 1 = a'i x /3i © a'i 2 /3 2 0 • • • © n /3 n , 

Xj 2 © k 2 2 = a' 21 Pi © a 2 i2 p2 © • ■ ■ © «2,n Pn , 

Xj n © k' 2 n = a! n i Pi © a' n 2 p 2 © • • • © a! n „ p n . 

After collecting several such equations, we obtained a linear system with n 2 + 
n + 2 m unknowns: the Xi’s, the a' ^’s and the k 2 i ’ s. This system can hence be 
solved based on n 2 + n + 2 m independent equations. Since every encryption 
provides n new equations, the required number of encryptions is at least n + 1 + 

2 m /n. Once all the a' ^’s and the k 2i 's have been recovered, we can inverse the 

matrix A -1 to get A and then compute k 2 = A - k' 2 . 

As explained in Section 1X21 we must fix aip = 1 in order to fix a representation 
among the equivalence class of the cipher. For the above system, this amounts to 
fixing a'n = 1. Here again, a'n may equal 0 in which case the solving fails and 
the attacker must try again by fixing a' 12 and so on. Another degree of freedom 
exists that is not recovered by solving the above system: one can add a fixed 
offset 5 to every s-box output and to every coordinate of k' 2 (which amounts to 
add A ■ (S,S,...,S) to k 2 ). Clearly, such a modification would not change the 
collected equations. In order to set this degree of freedom, we can fix one of the 
s-box output, say xo to 0. To summarize, additionally to the collected n-equation 
groups from each encryption, we add the equations a'11 = 1 and Xq = 0 in order 
to obtain a full rank system. 
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Note that fixing xo = 0 may induce a non-equivalent representation of the 
cipher. Indeed, the recovered cipher is equivalent to the real cipher but a fixed 
offset S is xor-ed to each s-box outputs in the last round. As a consequence the 
resulting ciphertexts are xor-ed with the constant value A ■ (S,S, ... ,S). Note 
that if a key-addition is performed after the nonlinear layer in the last round 
then its recovery absorbs this offset as for the other rounds. Otherwise, one 
must recover the offset 6 in order to correct the ciphertext values and get an 
equivalent representation of the cipher. This can be easily done by comparing a 
real ciphertext with the one obtained from the recovered cipher. 


Chosen Plaintexts Attack. To optimize the attack, one shall select the plain- 
texts in order to make every unknown of the system appear with the least possi- 
ble number of requested encryptions. The a^-’s and the k' 2 ^ s all appear in each 
group of n equations resulting from a single encryption. On the other hand such 
a group of equations only involves n out of 2 m unknowns ays. The best approach 
is hence to make n different x- L ’s appear for each encryption request. To do so, 
one can simply ask for the encryption of the plaintext 

(* • n 4- Oji • n + 1, i ■ n + 2, . . . , (i + l)n — 1) © Aq , 

for i = 0, 1, ... , [2' m /n] — 1. The s-box inputs in the first round of the corre- 
sponding encryptions then equal (0, 1, 2, . . . , n — 1), (n, n + 1, . . . , 2n — 1), etc. 
Every possible s-box value thus appears in the system after |"2 m /n] encryptions. 
It just remains to ask for the encryption of n + 1 additional plaintexts to get a 
full rank linear system in the n 2 + n + 2 m unknowns. 


3.5 Stage 3: Recovering fc 3 , fc 4 , . . . , k r 

Once the two first stages have been completed, it only remains to recover the 
last round keys fc 3 , fc 4 , ..., k r . This is simply done by detecting a collision 
[yi <- S(pj t i © kj,i)\ ~ [pj,i «— giving kj :i = pjj © 0^ for every round 

j € { 3 , 4, ... , r} and every s-box index i 6 {1, 2, ... , n}. 

4 SCARE in the Presence of Noisy Leakage 

So far, we have considered an idealized model in which the attacker is able to 
detect a collision between two s-box computations from their respective leakages 
with a 100% confidence. As a matter of facts, the proposed SCARE attack do 
not tolerate any false-positive error in the collision detections. In this section, 
we relax this assumption and describe a practical SCARE attack in the presence 
of noise in the side-channel leakage. As for the basic attack, the principle is 
to exploit equations arising from collisions in s-box computations. We explain 
hereafter how to collect sound equations with high confidence in the presence of 
noisy leakage. 
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4.1 Stage 1: Recovering Aq 

In our SCARE attack, the first stage exactly corresponds to the usual scenario 
of linear collision attacks that aim at recovering key bytes differences k\g ® kij 
by detecting collisions between s-box computations in the first round from the 
side-channel leakage |3l4l24l17l . 

In a linear collision attack, the attacker is assumed to possess the leakage 
traces corresponding to the encryption of N random plaintexts ((pt)t</v)- Let 
£t,i denote the leakage associated to *th s-box computation in the encryption of 
Pt ■ The principle is to compute the mean leakage £j„ x of the set [£ t ,i ; Pt,i = %} 
for every i and x, in order to average the leakage noise and detect collisions 
more easily. As explained in Section 13.31 detecting a collision between £ i>x and 
Ij.y implies the equality of the two s-box inputs x ® kip and y ® kij and pro- 
vides the linear equation kip ® kij = x ffi y. In [3], Bogdanov points out that 
the equation system arising from the key byte differences is overdetermined and 
that the redundant information could be used to tolerate some erroneous equa- 
tions. In E- Gerard and Standaert further show that solving such an equation 
system can be written as a LDPC0 code decoding problem for which an efficient 
algorithm is known. We suggest to use their method for the first stage of our 
practical SCARE attack. 

4.2 Stage 2: Recovering A, S and k 2 

As for the attack without collision errors, the second stage is the main task. To 
deal with the leakage noise, we make the well admitted Gaussian noise assump- 
tion. Namely, we assume that the leakage corresponding to an s-box computation 
p 4— S(/3) follows a multivariate Gaussian distribution with mean mg and co- 
variance matrix Ep, denoted Af{mp,Ep). 


Building Leakage Templates. The first step of the second stage consists in 
estimating the leakage parameters. Namely, for each /3 G we estimate the 
mean mp and the covariance matrix Up of the leakage from the s-box computa- 
tion p <— S(J3). The leakage basis of the noise-free attack is then replaced by a 
leakage template basis B = { (flip , Eg)g \ fi G F 2 ™} where rhp and Eg denote the 
estimated values for the leakage parameters. The estimation is obtained from the 
leakages used in the first stage, and possibly more, until the estimated means 
converge. 

Our convergence criterion is based on the Hotelling T 2 - test which is the natu- 
ral extension of the Student T-test for multinormal distributions (see for instance 
|23]). Let d denote the dimension of the distribution J\f{rrig , Eg) i.e. the number 
of points in an s-box leakage trace, and let ) denote the quantile function 

of the Fisher’s F-distribution with parameters {d\,d 2 ) {i.e. F(di,d 2 ) is the dis- 
tribution CDF). For some confidence parameter a G [0; 1] and some estimation 
quality parameter q G [0; 1], our convergence criterion is satisfied when we have: 


Low Density Parity Check 
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N-d )( a ) 


The rationale of this definition is detailed in the full version of the paper. 

Based on this criterion, the template basis is built iteratively: we first collect 
N leakage samples for every s-box input value 0. Based on these samples, we 
estimate the distribution parameters (rhp,Ep) for every 0, as well as the inter- 
class covariance matrix S. Then if we have max^ R a (a'p / det(S)) 1 / d < q for some 
chosen confidence a and estimation quality parameter q we stop. Otherwise we 
continue with twice more samples (namely we collect N more leakage samples 
and set N to 2 IV), and so on until we get a satisfying estimation quality. In 
practice, we shall use a = 99% and q = 0.5. 


Remark 3. A possible variant for building the template basis is to make the 
identical noise assumption which considers that Ep is equal to some constant 
matrix E for every [3. This enables a better estimation E based on all leakage 
samples. 


Collecting Equations. Once the template basis has been built, we collect 
several groups of n equations of the form x CD k' 2 = A~ x ■ f3. as in the basic 
attack (see Section [H~T1) . Due to the noise, we cannot determine the value of (3 
with a 100% confidence. To deal with this issue we use averaging. Namely, the 
encryption of the same plaintext p is requested several - say N - times and we 
compute the average leakage for each s-box computation in the second round. 
Let £i denote the average leakage for the ith s-box, and let 0* denote the corre- 
sponding (unknown) s-box input. The average leakage follows a distribution 
N(mp », jjEp*). Then we must recover the n corresponding values /3*, 0%, . . . , 
0* n in order to get a group of equations. The problem is hence to determine to 
which distribution Af(mp, jpEp) belongs each leakage based on the template 
basis. For such a purpose, we use a maximum likelihood approach, namely we 
follow the classical approach of template attacks [5] . Given the leakage observa- 
tion £,, the probability that the ith s-box input value ,3* equals some value 0 
satisfies 


Pr[/3* = 0 \£ i \ = - 




s'{£i) ’ 


where <j>p denotes the pdf of Af(mp, jqEp) satisfying 


(j)p{£) oc exp ( - y {£ - mp) T -E^ 1 -^- mp )) . 


The likelihood of the candidate 0 for 0* based on the estimations (nip)g and 
( Ep)p is hence defined as 


1(0 | £ 0 ) := 


exp ( - f (£i- rhp) T ■ Ep 1 ■ (£i- fhp)) 

E/3'e w 2 m ex P ( - f ( £ i ~ ■ Zp' 1 ■ (4 - fhp-)) 


( 2 ) 
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The corresponding likelihood for a vector (3 — fa , . . . , j3 n ) given the average 
leakage vector t = (li,£ 2) • ■ • , in) can then be defined as L(/3 | £) := ■ L(/3j 1 l/). 

Note that the most likely candidate argrnax^ L(/3 | £) is also the one whose 
coordinates are the most likely i.e. equal to argmax^. L (ft | £/) for every i. 

In practice, we shall select the most likely value of (3 as the good one with a con- 
fidence \-fi . However we not only want to select the best candidate, we further want 
its likelihood to be high {i.e. close to 1) in order to have a high confidence in the se- 
lected candidate. Getting a vector f3 with high likelihood may however be far more 
difficult than getting a single coordinate /3j with high likelihood since for the vector 
one needs all coordinates to have high likelihood. Indeed, the probability of having 
a high likelihood for the vector (3 is the probability of having a high likelihood for 
all coordinates /3* which is exponentially smaller in n. 

To deal with this issue, our approach is to restrict the number of equations 
of the form x © k ' 2 = A~ x ■ (3 that are needed to succeed the attack. For such a 
purpose, we first solve a subsytem {i. e. with less unknowns) for which we require 
less equations than in the original attack, and then we recover the remaining 
unknowns based on simpler forms of equations. 


Solving a Subsystem. We first solve a subsystem involving the a'j’s, the 
k 2 ,i’s and a restricted number of Xi’s. To do so we select a set of s values /3, 
say S = {0, 1, . . . , s — 1}, and we only request the device for the encryption of 
plaintexts from the set 

V a = { {Pi ; P 2 , ■ ■ ■ ,p n ) | Vi : 0 < pi ® ki t i < s — 1} . 

These plaintexts are such that all s-box inputs in the first encryption rounds are 
in S. We hence obtained a linear system as described in Section 13.41 but with 
n 2 + n + s — 2 unknowns: the a' .’s (but a[ 1 which is set to 1), the k' 2 i ’s, and 
the Xj's for i e S (but xq which is set 0). Such a system can be solved based on 
t = n + 1 + \{s-2)/n] good groups of equations. The value of s must hence be 
selected to ensure that the plaintext subspace V s is large enough to get t good 
groups with high confidence, while making t the smallest possible. 

In order to increase our chances to actually come up with t groups of correct 
equation, one direction would be to select a larger set of say q groups of equations 
(instead of only taking the t best) and test all combinations of t groups among 
them. The complexity of the resulting attack will however increase dramatically 
with q. 

So, let us assume that we have a computing power of 2 k , meaning that we 
can try to solve 2 k linear systems, and that we can get the leakage measurement 
from T encryptions. Then our approach is to request N times for the encryption 
of T/N different plaintexts in V s . For each of the T/N plaintexts, we compute 
the more likely candidate (3 for the s-box inputs in the second round, based on 
the JV-averaged leakages. We thus obtained T/N groups of n equations with a 
corresponding confidence {i.e. the likelihood of the best candidate (3). Then we 
select the q groups for which we get the highest confidence in the best candidate 
/3, where q is such that (®) « 2 k (that is q = cot2 k / t for some cq € [e -1 ; 1]), and 
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we try to solve each system arising from t of these q groups. In order to make 
sure that a found solution is the good one, we make the system over determined. 
This can be done without increasing the number t of needed equation groups. 
Namely, we take s < n+ 2 in order to get t = ;;+!+[ (s — 2) / u] = n + 2. We thus 
obtain systems of n 2 + 2 n equations with n 2 + n + s — 2 unknowns. Obtaining a 
bad system that has a solution roughly occurs with probability p e ss (^r)" S+2 . 
So we take s to make this probability small, typically s = n + 2 — 32/m giving 
p e sy 2 -32 . For instance, for n = 16 and taking s = 14, we then have to select 
t = 18 good groups of equations from \V S \ M 2 61 possible encryptions (which is 
quite enough). Another direction in increasing our chances of success would be 
to select the optimal averaging level. 

Selecting the Averaging Level. We now explain how to select the averaging 
level N in order to optimize the success probability of the attack. Increasing 
the averaging level is good on the one hand to lower the noise and get better 
confidence in the recovered s-box inputs. On the other hand, the lower N, the 
greater the number T /N of different equation groups among which we can select 
the q best ones. To select a good tradeoff, we adopt the approach of [2E] which 
estimates the success probability of an attack based on estimated leakage param- 
eters. Namely we assume that the estimated paremeters (mp)p, and {£p)p are 
the real leakage parameters and we simulate the attack accordingly. To simulate 
an attack, we fill two lists Succ and Fail by repeating the following steps: 

1: (3* (F 2 »)" 

2: for i = 1 to n do f j $ Af(rhp*, 

3: L max 4- EL max^ L(/J | ii) 

4: if argmax^L(/3 | it) = /3* for every i 
5: then add L max to Succ 
6: else add L max to Fail 

After iterating the above steps T/N times, one checks whether the q maximum 
values of SuccU Fail include at least t value from Succ or not. In the affirmative, 
the simulated attack succeeded, otherwise it failed. Once the attack simulation 
has been performed several times, we obtain an estimation for the success prob- 
ability of the attack. 

Compared to a real attack experiment, the obtained success probability is 
affected by two differences: the actual leakage distributions N(mp*, are 

replaced by the estimated distributions Af(fhp», and the distribution of 

the vector f3* of s-box inputs in the second round is replaced by the uniform 
distribution although it is not the case in practice since the plaintexts are ran- 
domly drawn from V s instead of (0, 1} ( . However for good estimations of the 
leakage parameters, we expect to get a good estimation of the trade-off of choice 
for averaging. 

Recovering Remaining Unknowns. For the remaining unknowns x s . x a +i, 

. . . , X 2 ”>-i we will here again use an iterative approach that recovers them one 
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by one. For the sake of clarity, we assume that the linear layer is such that the 
matrix A has a column jo with no zero coefficients. Then our approach is to take 
a random plaintext p in V s and to set its joth coordinates to s CD k\,j 0 so that 
the joth s-box inputs equals s. By definition, the *th s-box input in the second 
round satisfies 

01 = a*,i x tl © a*, 2 xt % © • • • © ai 7n Xt n © ki,i , 

where tj < s - I for every j 7^ jo and tj 0 = s. This can be rewritten 

P* = a itj x s © k 2 ,i © (J) a i}j x t . . (3) 

S'Tria 

Since we know the values of the s, the k 2 / s and the x tj ’s for tj < s — 1, 
recovering x s amounts to recovering 0* . And as we cannot recover 0* with a 
100% success probability, we use a maximum likelihood approach. 

Specifically, the likelihood of each candidate value for x s is initialized 

to 0 if u; G {xq, x \, . . . , x s -i} (indeed x s ^ {xo,x\, . . . ,x s -{\ as the s-box is 
bijective) and to (2 TO — s) -1 otherwise. Then the leakage resulting from each 
s-box computation is used to update the likelihood of each candidate for x s . 
Namely, the likelihood L(w) of the candidate ui is multiplied by the likelihood of 
the candidate 0^ for the ith s-box input, where 0f = Oi^ufflA!*^©© -^ 0 a,i,jX tj 
according to ©. Doing so for every s-box, L(w) is updated by 

L(u,) <- L( W ) x f[ L(^|^), 

where L(- | 10) is computed as in @ with IV = 1 (since we do not use averaging 
here). Eventually, the likelihood vector is normalized, that is all the coordinates 
are divided by L(w). We iterate this process for several encryptions until one 
likelihood value L(u;) get close enough to 1. Then we deduce x s = w, and start 
again with x s +\, and so on until x 2 m-i . Note that we can stop once x 2 ™~2 since 
a single value remains for x 2 m_ 1. 

4.3 Stage 3: Recovering fc 3 , fc 4 , . . . , k r 

Eventually, the last round keys can be recovered one by one by performing any 
classical side-channel key recovery attack (since we now know the design of the 
cipher). We suggest to use a maximum likelihood approach based on the template 
basis o 


5 Experiments 

We report hereafter the results of various simulations of the practical SCARE 
attack described in the previous section. Each simulated attack aims at recov- 
ering a secret cipher with classical SPN structure (such as described in Section 
[2]). We consider two different settings for the cipher dimensions: 

2 Such technique is well known and pretty similar to that used in the previous section 
so we do not detail it here. 
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• the (128,8)-setting: 128-bit message block and 8-bit s-boxes, as in the 
AES block cipher [T3] (i.e. i = 128, n = 16, m = 8), 

• the (64,4)-setting: 64-bit message block and 4-bit s-boxes, as in the LED [20] 
and PRESENT [7] lightweight block ciphers {i.e. (. = 64, n = 16, rn = 4). 

For each attack experiment, a random secret cipher is picked up. Namely, we 
randomly generate a full-rank n x n matrix over F 2 m, a bijective rn-bit s-box, 
and several l-bit round keys. The attack succeed if it recovers an equivalent 
representation of the generated cipher. 

In order to evaluate our attack under a realistic leakage model, we have pro- 
filed the leakage of an 8-bit s-box computation on an AVR chipH The side-channel 
leakage was captured by the means of an electromagnetic probe and a digital os- 
cilloscope with a sampling rate of 1G sample per second. To infer a leakage model 
from the measurements we made the Gaussian and independent noise assumptions. 
We therefore estimated the mean leakage for every s-box input value and the mean 
leakage for every s-box output value based on 100000 leakage traces. We then se- 
lected three leakage points for the input and three leakage points for the output. 
We thus obtained 256 means {m\^,m 2 ,p,mz,p)p for the 256 possible input values 
/? G {0, 1, . . . , 255} and the 256 means ( 7714 ,^, 7715 ^, for the 256 possible out- 

put values /( e {0,1,..., 255}. Afterwards we estimated the noise covariance ma- 
trix S for the selected points {i.e. the matrix of covariances between the 6 points 
after subtracting the means) . A preview of the obtained parameters can be found 
in the full version of the paper. In particular we get a multivariate SNB0 of 0.033 
and univariate SNR. c j| of 0.13, 0.033, 0.099, 0.058, 0.047, and 0.051, for the different 
leakage points. These inferred parameters provide us with a leakage model for our 
attack simulations. Namely, for a given cipher with s-box S, the leakage associated 
to the s-box computation with input #3 is randomly drawn from the multivariate 
Gaussian distribution J\f{mp, S) with mean satisfying 

mp = (rai i/ 3,TO2,/3,m3 )/ 3,ro4 iS (£),m 5iS ( / 3),m 6iS ( / 3)) . 


Stage 1 . For the recovering of k\, we implemented the Gerard and Standaert 
method based on the normalized Euclidean distance. For the (128,8)-setting, we 
obtained a 100% success rate using a few thousands of leakage traces while for 
the (64,4)-setting a few hundreds were sufficient. We did not try to optimize this 
stage of the attack (in particular we did not use the Bayesian extension proposed 
in El) as it requires a very small amount of leakage traces compared to the next 
stage. 

3 ATMega 32A, 8-bit architecture, 8Mz. 

4 The multivariate SNR is defined as the ratio of the interclass generalized variance ( i. e. 
the determinant of the leakage means covariance matrix) over the intraclass generalized 
variance {i.e. the determinant of the noise covariance matrix) to the power 1/d (where 
d is the dimension equal to 6 in our case). 

5 The univariate SNR is defined as the variance of the means over the variance of the 
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Fig. 1. Stage 2.1 for the (128,8)-setting: 
success rate over an increasing num- 
ber of leakage traces (in log 2 -scale) for 
a computing power of 2 k with k € 
{0,1,8,32} 


Fig. 2. Stage 2.1 for the (64,4)-setting: 
success rate of stage 2.1 over an increas- 
ing number of leakage traces (in log 2 - 
scale) for a computing power of 2 fc with 
k € {0,1,8,32} 


Stage 2.1. For this stage (recovery of A, fa, 5(0), 5(1), ... , 5(s— 1)) we fixed the 
number s of s-box outputs in the system to 14 for the (128,8)-setting and to 10 
for the (64,4)-setting (according to the suggested formula s = n + 2— 32/ra). For 
both settings, we chose a precision quality parameter q = 0.5 for the building 
of the template basis and we simulated the attack for a computing power of 
2 k with k £ {0,8,16,32} (i.e. 2 k systems among the likeliest ones are tested). 
The obtained success rates are plotted in Figure [T] for the (128,8)-setting and 
in Figure [2] for the (64,4)-setting. Each curve represents a different computing 
power. Naturally the leftmost curves (i.e. the most successful) correspond to the 
2 32 computing power and the rightmost ones to the 2° computing power. As 
one can see, with a reasonable computing power, a 100% success rate is reached 
with less than 2 16 leakage traces for the (128,8)-setting, and with less than 2 13 
leakage traces for the (64,4)-setting. 



Fig. 3. Stage 2.1 for the (128,8)-setting: 
success rate over an increasing number 
of leakage measurements (in log 2 -scale) 
for a estimation quality q = 0.1 


Fig. 4. Stage 2.1 for the (64,4)-setting: 
success rate over an increasing number 
of leakage measurements (in log 2 -scale) 
for a estimation quality q = Q.l 
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For the (128,8)-setting the precision quality q = 0.5 makes our means esti- 
mations to converge after 1024 leakage samples per value /3 £ F 256 - Since 16 
samples are provided per leakage trace (one for each s-box in the first round), 
this makes a data complexity of 2 14 leakage traces for building the template 
basis. As we need around 2 16 leakage traces to get a 100% success rate in stage 
2.1 we might get a better overall attack complexity by improving the estimation 
precision a little bit. In order to see the kind of improvement we could get from 
a better estimation, we also performed attack simulations for a precision quality 
q = 0.1, implying an increase of the data complexity to 2 17 leakage traces for 
the template basis. The obtained success rates are given in Figure [3J We get a 
100% success rate with between 2 14 and 2 14 5 leakage traces for all computing 
powers except for k = 0 which requires 2 15 traces. 

For the (64,4)-setting, the estimated means converge after 2048 samples per 
value /3 G Fig, making a data complexity of 2048 for template basis. Here again 
we also performed attack simulations for a precision quality of q = 0.1 (see results 
in Figure [I]). We get a data complexity of 2 13 leakage traces for the template basis 
and around 2 12 5 leakage traces for the system solving. This precision therefore 
seems to give the best tradeoff for the (64,4)-setting. 



Fig. 5. Number of leakage traces to get a 90% success rate over an increasing SNR in 
[0.1; 1] for the (128,8)-setting (green curve) and the (64,4)-setting (red curve) 

In order to observe the impact of the SNR on the data complexity we per- 
formed attack simulation for which we weighted the noise covariance matrix in 
order to get some desired multivariate SNR between 0.1 and 1. For both settings, 
we fixed the estimation quality to q = 0.5 and the computed power to 2 16 . Figure 
[5] plot the required number of leakage traces to obtain a 90% success rate with 
respect to the multivariate SNR. We observe a strong impact of the SNR on the 
attack efficiency. In particular for an SNR close to 1 our attack only requires a 
few thousands of traces. 

Stage 2.2 and 3. The recovery of the remaining s-box outputs based on the 
maximum likelihood approach is very efficient. Taking a lower bound of 0.999 on 
the likelihood to decide that a candidate is the good one, the attack stops after 
640 leakage traces on average and reaches a 97% success rate for the (128,8)- 
setting (a tighter likelihood bound would yield a 100% success rate). For the 
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(64,4)-setting, it stops after 10 leakage traces on average and reaches a 100% 
success rate. The high efficiency of the attack for the (64,4)-setting comes from 
the fact that it only has to recover 6 remaining s-box outputs. Therefore the 
likelihoods quickly converge. 

We did not implement attack simulation for the third step but we would 
clearly get comparable figures than for stage 2.2, i.e. negligible data requirements 
compared to stage 2.1 which is clearly the bottleneck of our attack. 

6 Discussions and Perspectives 

In this paper we have described a generic SCARE attack against a wide class 
of SPN block ciphers. The attacker model defined in Section 13.11 assumes that 
colliding s-box computations can be detected from the side-channel leakage. We 
have first investigated the case of perfect collision detection and then we have 
extended our attack to deal with noisy leakages. 


About the Attacker Model. As mentioned in Section l3Tl ( Remark l2l). our 
attacker model implicitly means that the cipher implementation processes the 
s-box computations in a sequential way, which is therefore more suited for soft- 
ware implementations. This makes sense for secret ciphers which are rarely im- 
plemented at the hardware level. Note that it is also common to use a sequential 
approach for the s-box computations in light-weight hardware implementations 
of block ciphers, and our attack naturally applies to this context. Our model 
further implicitly assumes that two s-box computations with the same input at 
two different points in the execution produce identical side-channel leakages (or 
identically distributed in the noisy context). Although this assumption seems 
fair in practice, it might not always be satisfied. It was for instance observed in 
[1715m that for some software implementations the side-channel leakage of an s- 
box computation may vary according to the s-box index and the target register. 
For such implementations, it might not be possible to detect collisions between 
two s-box computations at different indices. This issue can be addressed by con- 
sidering each s-box index independently, which amounts to deal with the multiple 
s-boxes setting studied in the full version of the paper (except that we need to 
recover a single s-box). In this context, one only detects collisions between s-box 
computations at the same index. Note that our attack still assumes that s-box 
computations at a given index leak identically in the successive rounds. 


Countermeasures to Our Attack. Our work shows that under a practically 
relevant assumption, it is possible to retrieve the complete secret design of a 
block cipher with a common SPN structure. This clearly emphasizes that the se- 
crecy of the design is not sufficient to prevent side-channel attacks, and that one 
should include countermeasures to the implementation of secret ciphers as well. 
A typical choice for block cipher implementations in software is to use mask- 
ing with table recomputation for the s-box (see for instance [2311] )). As studied 
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by Roche and Lomne in EDI. such a countermeasure only prevents collision de- 
tections between different cipher executions but it still allows the detection of 
intra-execution collisions. In a variant of their attack against AES-like secret 
ciphers, Clavier et al. take this constraint into account in order to bypass the 
masking countermeasure with table recomputation m- Our attack in the ideal- 
ized leakage model (perfect collision detection) could also be extended to work 
with this constraint. It would be more tricky in the presence of noise as averaging 
would not be an option anymore, but our attack could still be generalized using 
a similar approach as [3U]. In order to thwart our attack, one should therefore 
favor masking schemes enabling the use of different masks for the different s-box 
computations (see for instance [2618] 1. so that intra-execution collisions would 
not be detectable anymore. Another common software countermeasure is oper- 
ation shuffling (see for instance [22]). This countermeasure has a direct impact 
on our attack as it randomizes the indices of the s-box computations from one 
execution to another. As shown by Clavier et al. m such a countermeasure 
can be simply bypassed in the idealized leakage model. However, it seems more 
complicated to deal with in a noisy leakage model especially if combined with 
masking. We therefore suggest to use such a combination of countermeasure 
against our attack. 

Perspectives. Our work opens several interesting issues for further research. 
First, our attack could probably be improved by using better /optimal approaches 
to solve the set of noisy equations arising in Stage 2.1 (see Section lT2l) . One 
could for instance follow the approach of [18116] by rewriting the system as a 
decoding problem. Our attack could also be improved by considering a known 
ciphertext scenario (as e.g. done in p~S] 1 . On the other hand, our attack was only 
validated by simulations (although from a practically inferred leakage model). 
It would be interesting to mount the attack against a real implementation of a 
secret SPN cipher e.g. on a smart card, to check how the different steps work in 
practice. Another interesting direction would be to investigate extensions of our 
attack against protected implementations in order to determine to what extent 
an implementation should be protected in practice. 
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